- Pilot Programs
TLC Trip Record Data
- Request Data
Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.
For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.
On 05/13/2022, we are making the following changes to trip record files:
- All files will be stored in the PARQUET format. Please see the ‘Working With PARQUET Format’ under the Data Dictionaries and MetaData section.
- Trip data will be published monthly (with two months delay) instead of bi-annually.
- HVFHV files will now include 17 more columns (please see High Volume FHV Trips Dictionary for details). Additional columns will be added to the old files as well. The earliest date to include additional columns: February 2019.
- Yellow trip data will now include 1 additional column (‘airport_fee’, please see Yellow Trips Dictionary for details). The additional column will be added to the old files as well. The earliest date to include the additional column: January 2011.
Due to COVID-19 and its impact on the daily operations of small businesses, TLC granted smaller bases an extension on trip record submissions. Trip and trip-related data for these bases will be updated as it becomes available.
Data Dictionaries and MetaData
- Trip Record User Guide
- Yellow Trips Data Dictionary
- Green Trips Data Dictionary
- FHV Trips Data Dictionary
- High Volume FHV Trips Data Dictionary
- Working With PARQUET Format
Taxi Zone Maps and Lookup Tables
- Taxi Zone Lookup Table (CSV)
- Taxi Zone Shapefile (PARQUET)
- Taxi Zone Map – Bronx (JPG)
- Taxi Zone Map – Brooklyn (JPG)
- Taxi Zone Map – Manhattan (JPG)
- Taxi Zone Map – Queens (JPG)
- Taxi Zone Map – Staten Island (JPG)
- 09/08/2017 - FHV trip record files from June 2017 updated as of 09/08/2017
- 08/30/2017 - FHV trip record files from July 2016 through June 2017 updated as of 08/16/2017
- 03/13/2017 - FHV trip record files from January 2016 through December 2016 updated as of 02/14/2017
- 09/22/2015 - TPEP and LPEP trip data PARQUETs from January through June 2015 have been updated to include a new field [improvement_surcharge] which lists the itemized portion of the fare covering the Taxicab Improvement Surcharge or Street Hail Livery Improvement Surcharge. This is a $0.30 surcharge on all trips to help fund accessibility in taxis and SHLs, which began on January 1, 2015. All TPEP and LPEP trip data files uploaded moving forward will also include this new field.
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
Build a model that predicts the total ride duration of taxi trips in New York City
dsankush/NYC-Taxi-Trip-Time-Prediction
Folders and files, repository files navigation.
AlmaBetter Verfied Project - AlmaBetter School
NYC Taxi Trip Time Prediction
Developed various models to predict the total ride duration of taxi trips in New York City
đž Data Description
The dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform. The data was originally published by the NYC Taxi and Limousine Commission (TLC). The data was sampled and cleaned for the purposes of this project. Based on individual trip attributes, you should predict the duration of each trip in the test set.
NYC Taxi Data.csv - the training set (contains 1458644 trip records)
đž Project Files Description
This Project includes 1 executable files, 3 text files as well as 1 directories as follows:
Executable Files:
- NYC_Taxi_Trip_Time_Prediction_Capstone_Project.ipynb - Includes all functions required for Regression operations.
- Project presentation.docx - Contains all the analysis which is presented after completing the analysis.
- Project report.pdf - Contains whole analysis strategy and analysis methodology followed for the project.
Source Directories:
- NYC Taxi Data.csv - Includes all the required data for the Regression task.
đ XGBOOST (Ensemble Model)
Before beginning with mathematics about Gradient Boosting, Hereâs a simple example of a CART that classifies whether someone will like a hypothetical computer game X. The example of tree is below:
where, K is the number of trees, f is the functional space of F, F is the set of possible CARTs. The objective function for the above model is given by:
where, first term is the loss function and the second is the regularization parameter. Now, Instead of learning the tree all at once which makes the optimization harder, we apply the additive stretegy, minimize the loss what we have learned and add a new tree which can be summarised below:
XGBoost minimizes a regularized (L1 and L2) objective function that combines a convex loss function (based on the difference between the predicted and target outputs) and a penalty term for model complexity (in other words, the regression tree functions).
đ execution instruction.
The order of execution of the program files is as follows:
1) Health_Insurance_Cross_Sell_Prediction.ipynb
The Health_Insurance_Cross_Sell_Prediction.ipynb is to be executed to access all the analysis done for classification operations.
đ Conclusions
1) Regression analysis was conducted to create a system that can assist taxi companies in determining the duration of cab trips, thereby improving their business models and enhancing customer availability.
2) The problem statement was resolved by using various regression techniques, including linear regression, decision trees and XGBOOST regressions.
3) Preparing the data for analysis and loading it for machine learning models involved PCA transformation, feature engineering, and forward Feature selection.
4) Various regression models were processed and XGBOOST performed better than other models with high R2 score and low RMSE score with 97% accuracy.
đ Future work
- As this data set is of only almost 6 months and I think there should be more data for more than a year and also some more features should be there so that we can train our models with more significant information that will help our model to learn more efficiently so that we can get more higher performance from Machine Learning Models.
- And also we can extract more information about this data by getting more features so that we can explore more about this kind of data
Ankush Kumar | Data Science | Machine Learning Engineer | Deep Learning enthusiast
Contact me for Data Science Project Collaborations and Data Science related job roles
đ References
XGBoost has helped to understand more about XGBOOST
Available: https://xgboost.readthedocs.io/en/stable/
Geeksforgeek helped to understand the working of xgboost more effciently and easily.
Available: https://www.geeksforgeeks.org/xgboost/
Medium.com, 'NYC Taxi Trip Duration Prediction using Machine Learning'. [Online].
Available: https://medium.com/@ShortHills_Tech/nyc-taxi-trip-duration-prediction-using-machine-learning-a92874bd761
Youtube.com, 'NYC taxi trip duration'. [Online].
Available: https://www.youtube.com/watch?v=p1OnQfFfJeU
Reseachgate, 'NYC Taxi Trip and Fare Data Analytics using BigData'. [Online].
Available: https://www.researchgate.net/publication/287205557_NYC_Taxi_Trip_and_Fare_Data_Analytics_using_BigData
lexisnexis.machinelearnigmastry, 'Regression Metrics for Machine Learning'. [Online].
Available: https://machinelearningmastery.com/regression-metrics-for-machine-learning/
Thank you so much for visiting đ
- Jupyter Notebook 100.0%
Cab Etiquette In NYC: All You Need to Know
Weâve all been there. You stayed out a little later than you planned, and youâre a little worse for wear. You need to go to bed, but the cityâs unfamiliar to you. The public transport maps might as well be Jackson Pollock paintings. So you do what every single person does in films and TV shows based in New York. You raise your hand, and within seconds a yellow cabâs pulled up beside you. Hopefully youâre on your way in seconds and home safe and sound , but if anything seems off or you need help and advice, read on. Hereâs what you need to know about cab etiquette in NYC.
Can a cab driver ever refuse me service?
Yes, but only if the trip is more than 12 hours long, or if their âtaxiâ light is off. 12 hour+ journeys are against the law in the US, and only taxis with their lights on are currently working. If youâre staying far out of the city centre, perhaps get in the cab before telling them where youâre going. It might seem sneaky, but once youâre in their cab they are legally obligated to take you to your destination. Crazy, right?
My taxi is loud and uncomfortable. What can I do?
A lot, thankfully. Riders have rights too, after all. If your driver is on a call or using their phone, theyâre being super illegal. Feel free to remind them. If the cab is too hot or cold, depending on the time of year, you can also request they put the air con/heating on. And if their music is too loud, by all means, politely ask them to turn it down or off. Just donât berate their choice of genre.
However, if the driver refuses these, or any reasonable requests, you have the right to get out at any time. And remember to take down their medallion number if you want to make a complaint. Itâs on their licence plate, the hood of the taxi, and on your receipt if you request one.
What if Iâm being loud, and making the driver uncomfortable?
Firstly, why...would you... do that? Secondly, while drivers have no legal grounds to ask you to keep it down, have some respect for them. And for yourself. Driving a taxi all day is exhausting , and navigating the hectic streets that never sleep requires concentration. Cab etiquette in NYC, or anywhere works both ways. Be respectful, and youâll likely earn their respect. And a safer and quicker journey home, too.
Should I stare at them creepily through the rear-view mirror?
No. No, donât. Why would you even...?
How much should I tip?
Tips are big business in New York, as they are in the rest of the US. But sadly youâll be expected to pay over the odds in the Big Apple. 20% of the fee is the recommended amount. If youâre paying with card instead of cold hard cash, the amount of gratuity will automatically be added to the charge. It could go as high as 30%, so keep that in mind if youâre squeezing pennies. Of course, if youâre an out-of-towner and theyâve been helpful with info or recommendations, why not be a nice little human and show them your gratitude with money?
Tippingâs the best way to thank them, but if you want to go above and beyond because they did, hop on the nyc.gov website and leave a glowing review, you selfless beauty.
If the driver asks for cash, is it OK to use my card instead?
Yes. Every taxi in NYC is required by law to take card, so if your driver says they donât have a machine or that itâs broken, itâs a ruse. Persist, and victory will be yours. Drivers may also mention theyâve selected âCashâ instead of âCardâ and that they canât reverse the decision. This, too, is a ruse. Stay strong, and wait for the card machine. Itâs simply a case of them pressing a single button to make it happen. Also get your receipt - it contains lots of vital information like their medallion number which youâll need if you lose something in the cab, or want to make a complaint.
Thatâs what you need to know about taxi etiquette in NYC. We hope these tips help. Of course, weâre always open to suggestions, so if you have any other top tips youâd like to add, let us know in the comments below! Stay safe, travelers.
Has this cab etiquette in NYC blog satisfied your itch for all things New York? No? Still prefer public transportation? Sure thing, here's more about the metro system in NYC .
Continue reading
The rockefeller christmas tree lighting, diwali new york: a festival of light.
Things to do in New York on Labor Day Weekend
Have a 5% discount, on us.
More savings? You're welcome. Sign up to our newsletter and receive exclusive discounts, vacation inspiration and much more.
- Thick check Icon By signing up, you agree to receiving email updates in accordance with The New York Pass privacy policy . We do not sell your personal data.
Abhishek Das
Lead Data Scientist at KPMG Digital Delta
Exploring NYC Taxi Data (Updated)
Scatterplot of all pickups and dropoffs in New York City
This post explores a subset of the NYC taxi dataset for the month of April 2013. I extract, transform and load the trip fare and trip details csv files into a sqlite database. I use this data to predict the fare and tip taxi drivers will receive. The repository containing my entire analysis is here and the presentation slides are available here as pptx and here as pdf.
The April 2013 taxi data is provided in two csv files: trip_details and trip_fares. I extract, transform and load these csv files into two separate tables: trips_table and fares_table in a SQLite database. I make subsequent calls to these tables in my EDA and modeling notebooks. The ETL process is detailed here .
Data Cleaning
Each table is cleaned for outliers including restricting latitude and longitude co-ordinates to lie in beween (40.67, -74.027) and (40.85, -73.85). I remove trips that last for 0 minutes and 0 miles. I also restrict the dataset to all trips made within New York City alone and itâs two closest airports: La Guardia and JFK International. This excludes trips made to Westchester and Nassau counties (Rate Code 4) as well as out of town trips (Rate Code 5) which are negotiated at a flat fee, where the odometer or trip time is not indicative of the distance or duration of the trip. While these actions may bias the results, 99.84% of all trips in April 2013 were made within the city of New York. A few thousand trips have their payment registered as âDisputedâ, âNo Chargeâ or âUnknownâ. These trips were excluded as overwhelmingly, passengers paid by credit card or cash. Finally, the NYC Taxi and Limousine company permits a maximum of 6 passengers in a 5 passenger taxicab, if the sixth passenger is a child under 7 who can sit on an adultâs lap. I screen out all trips with more than 6 passengers.
Data Merging
Using taxi medallion as unqiue taxicab identifier, hack license as unique driver identifer, vendor id and pickup_datetime as common keys, I merge the two tables above and return just over 14 million rows of data. Going forward, I use the assignment questions to propel my EDA of the dataset. The complete data munging process is described in this notebook .
Basic Questions
Complete notebook available here .
Q1. What is the distribution of the number of passengers in each cab?
Overwhelmingly taxi cabs are hailed by a sole passenger. More cabs are hailed by a single passenger than the total number of cabs hailed by two or more passengers as shown in Figure 1.
Q2. Do most customers pay with cash or card?
The results are pretty close, with nearly 54% of trips being paid for by credit card.
Q3.1 What does the distribution of fare amounts look like?
The initial charge on a cab is $2.50, so I confirm there are no fares below this amount. The most expensive fare was $204 while most fares were under $35.
Winsorizing the fare amount data by removing the top and bottom 1% shows the median fare amount is $9 as shown below. The bottom 5% of fare amounts is less than $5 while the top 95% of fares is higher than $24.
3.2 Is there a difference between airport and non-airport fare amounts?
Figure 5 shows the median and modal airport fare amount is $52. This is in contrast to non-aiport fares which tend to be under $35.
I looked into whether this specific fare amount was more or less likely to be paid by cash or credit card but the results were split with 56% of passengers choosing to pay with card and 44% choosing to pay with cash.
Q4.1 What does the distribution of tip amount look like?
When looking across all rides, most passengers donât appear to tip well. While tip amounts via credit card can be verified, cash tips may be underreported by the taxi drivers themselves. Winsorizing the distribution of tip amounts, the modal tip amount is $0 while the median is $1.
Q4.2 Is there a difference in the distribution of airport versus non-airport tips?
However, passengers tend to be more generous when it comes to tipping the cabbies that take them to the airport. Even though there are fewer airport fares compared to non-airport fares, it is understandable that drivers would want to take more airport trips.
Q5.1 What does the distribution of total amount look like?
Given the relatively low tip amounts reported, the distribution of the total amount will be similar to the distribution of the fare amount. Winsorizing the total amount by removing the top and bottom 1% shows the median amount is just shy of $11.
Q5.2 What does the distribution of total amount look like?
Airport total amounts are higher than non-airport total amounts which makes sense as airport fares are higher than non-airport fares. The median/modal airport total amount is $57.83.
Q6. What are the top 5 busiest hours of the day?
Evenings after work or dinner appear to be the busiest which most cabs being hailed at 7pm. The heatmap below shows that Monday and Tuesday evenings between 6pm - 8pm followed by Friday and Saturday evenings, are when most taxi trips occur.
Q7. What are the top 10 busiest locations in the city?
I filter latitude and longitude down to 2 decimal places and then sort through the most popular pickup and dropoff locations. These are identical and are all located in Manhattan, over 12.36 million trips. Rounding down latitude and longitude will increase clustering of pickup and dropoff points.
Q8. Which trip has the highest standard deviation of travel times?
Each trip is uniquely defined by its pickup and dropoff co-ordinates. It is important to determine what minimum sample size of trips to use to calculate the standard deviation. If there is a unique trip for example, then we cannot calculate itâs standard deviation of travel times. What minimum sample size do we need to determine the standard deviation of trip times? If there is a trip that has occured twice, one trip being 5 minutes long and one trip being an hour long, this trip will have a very high standard deviation based on a relatively small sample.
I make the following assumptions:
- Margin of Error = 5%
- Confidence Interval = 95% which is a Z-Score of 1.96
- Standard Deviation = 0.5 (expecting 50% standard deviation will ensure large enough sample size)
The required sample size = ((1.96 x 0.5)/0.05)^2 = 384.16 = 385 trips, which is the minimum threshold of trips applied. All routes with fewer than 385 trips over the month are excluded. This minimum exclusion is applied to answering all questions going forward.
Travel times for trips originating from La Guardia airport to New Yorkâs boroughs have the largest variance. Apparently airport traffic IS a nightmare.
Q9. Which trip has the most consistent fares?
Using the same minimum sample size threshold, I now examine the top 5 fare amounts have the lowest standard deviation as these will be the most consistent fares. Figure 13 reveals that these are shorter non-Airport routes. Three of these trips begin at the same location in Manhattan. To get more color on the differences between trips it would be interesting to understand the time of day and day of the week the trips were occcuring on.
Open Questions
Q10. which trips can we confidently use means as measures of central tendency to estimate fares, time taken.
As mentioned in question 8 above, certain trips may only occur once or twice, making calculations of central tendency based on these trips biased and erroneous. If the same trip takes twice as long for one taxi driver as it does for another, and our population is two trips, this skews calculated means and variances.
So how many occurences of the same trip - identified as beginning and ending at the same geocodes - are required before measures of central tendency can be calculated with confidence? Among 14 million trips, should the threshold be 50 occurences of the same trip or 1000?
The required sample size = ((1.96 x 0.5)/0.05)^2 = 384.16 = 385 trips, which is the minimum threshold of trip occurence over the month to be comfortable calculating measures of central tendency to estimate fares. Most of the trips which originate and end in Manhattan, including trips to La Guardia or JFK airports cross this threshold easily.
Q11. Build a model of Taxi Fare and tip given pickup and dropoff location
Fare modeling notebook available here while the tip prediction notebook can be found here .
I examine the correlations between trip fare (and log fare) versus the features in my database. It stands to reason that several features will have a high correlation with the fare including how long the trip was and the distance covered.
The variables chosen to predict fare and percentage tip include:
- Average Speed Each Hour
- Trips per Hour
- Pickup Longitude and Latitude
- Dropoff Longitude and Latitude
- Trip Distance
- Pickup Hour
- Dropoff Hour
- Day of Week
- Day of Month
As there are millions of data points I use as RANSAC Regression model using the 9 features above as it is robust to outliers in the y-axis. The model has an R-square of 80.7%. The OLS model has a slightly higher R-square of 82% which persists after running a five fold cross validation. I show a line of best fit among the fare data in Figure 15 below.
Note the scatter plot of predicted versus actual fares shows a cluster of fares at the $50 mark, which requires further investigation. Either these fares were rounded or mis-reported by the taxi drivers. It may seem obvious that taxi drivers may negotiate lower fares up to $50. What is more puzzling are fares where our linear model predicts a high fare, but the taxi driver only reports $50. Are drivers pocketing these large fares?
When it comes to predicting the percentage of the fare that will be left as a tip, I use a similar linear model. This time an OLS regression has an R-square of 1.2% when it comes to predicting how much of the fare will be a tip. The linear model does a very poor job of predicting the percentage tip amount. We fare slightly better using a Neural Network and a Random Forest Regressor.
Q12. How would a taxi owner maximize earnings in a day?
I distinguish between a taxi owner and a taxi driver as follows. The taxi driver is represented by the hack license and their average daily earnings in April 2013 was $259 a day. A taxi owner owns the medallion for a given taxi and can have several drivers drive their cab. The average daily earning by a medallion (taxicab) was $480 per day.
The average daily revenue for taxi drivers is $480 per day. The constraining factor here is driving hours per day. I consider two approaches: looking at routes that generate the highest daily revenue and the routes that earn the highest revenue per hour. This way a taxi owner can either concentrate on areas that generate the highest revenue or lease out a medallion taxi to two drivers driving 12 hour shifts to maximize daily revenue.
Figure 17 plots all routes that generate that highest daily revenue and the total number of trips required to generate this revenue. Note all these trips are all based in Manhattan, except for one which is from Manhattan to La Guardia Airport.
Note from question 8, the trip to LaGuardia airport has the highest standard deviation of travel times. As total amount charged will vary will the time of the trip, consistently relying on this trip for maximizing revenue may not be the best solution. A taxi owner could prioritize taxi bookings for Manhattan trips and potentially take on other trips if they crossed the average daily revenue of taxi drivers ($480).
Another way to consider this problem is that we are trying to maximize revenue in the available time. I build a feature which is the ratio of total_amount/time driven in hours and see which routes maximize the earnings per hour driven. These routes can be maximized upon. Note assumptions here are that regardless of time of day these are the most profitable routes per hour.
Interestingly, these trips begin and end at the same geocode (rounded to 2 d.p.) and there appear to be intra-Manhattan geocode or intra-Airport geocode trips. These shorter trips can be used to maximize daily earnings. The results when trying to maximize the total amount earned per mile driven are identical.
It may be unrealistic to expect taxi drivers to keep driving back and forth between the same set of streets all day. In this case it may bear looking at less crowded routes as discussed in Q14 below.
Q13. How would a taxi owner minimize work time while retaining average wages earned by a typical taxi in the dataset?
As mentioned above, the average daily earning of a taxicab is $480. Looking at Figure 11 again, note that demand for taxicabs highest on Monday and Tuesday evenings between 6pm - 8pm or Monday and Tuesday mornings between 8 and 10 am. This is the morning and evening office traffic.
A taxi owner looking to minimize their driverâs work time should ensure drivers are working the morning rush and evening shifts from 6pm onwards. By contrast demand for taxis is much lower between 3am - 5am Monday to Thursday, so these hours can be skipped over.
By far the evening route that generates the most revenue per hour begins and ends from geocode (40.77, -73.86) and is an airport to airport transfer route. These are followed by several routes within Manhattan and are shown in Figure 16 below.
These evening routes generate the highest total fare amount per hour of driving. Taxis can focus on these locations until they hit their daily goal of $480 (or $259 per driver). After clearing their goal they can move on to more varied fares. Alternatively, if a taxi driver started off at a location other than the ones highlighted in Figure 16, they can drive to these routes to make up their daily fares.
Q14. How would a taxi company with 10 taxis, maximize earnings?
Assume each taxi can be driven all day by 2 drivers working 12 hour shifts without wear and tear. This translates to 20 shifts per day for the taxi company. I would ensure taxis are available at the most popular pickup and dropoff locations and for trips with the most consistent fares. However you wouldnât want taxis working for the same company to undercut each other for the same fare.
My analysis so far reveals several insights for a smaller taxi company:
a. Instruct taxi drivers to focus on the routes with the highest earnings per hour (or earnings per mile). This would keep taxis working within smaller areas (zipcodes) and would allow the company to keep a fleet of cars working airport shifts and another fleet working Manhattan island shifts. The difficulty here is whether taxis could legally deny providing service to passengers who want to travel out of these zones
b. Once a taxi driver has earned half the average daily wage of taxi driver ($240) during their shift, give them the option to engage out of town fares or those fares whose trip times have higher standard deviation e.g. Manhattan to LaGuardia fares or possibly out of town fares where the total amount earned may be higher.
c. The worst time to have a taxi out for service are weeknights or Friday and Saturday nights as these tend to be the busiest times. Get taxis serviced during the day.
d. The most popular routes may be overcrowded, so it may be worth focusing on trips that generate the highest total revenue with the smallest number of individual trips, as shown below.
Further Questions
It would be interesting to see the impact of services such as Uber or Lyft on taxi demand over time. Also of interest would be the impact of precipation or temperature on a particular day on taxi demand, fare and tip. Finally, information on traffic congestion and road conditions would be invaluable to getting more insight from this dataset.
Browse Econ Literature
- Working papers
- Software components
- Book chapters
- JEL classification
More features
- Subscribe to new research
RePEc Biblio
Author registration.
- Economics Virtual Seminar Calendar NEW!
New York City taxi trip duration prediction using MLP and XGBoost
- Author & abstract
- 2 Citations
- Related works & more
Corrections
(Hamad Bin Khalifa University, Qatar Foundation)
(Department of CTO 5G, Wipro Limited)
(CFO Technology, Enterprise Risk Function Technology, Bank of America)
(University of South Wales)
(Prince Sattam bin Abdulaziz University)
(King Abdulaziz University)
Suggested Citation
Download full text from publisher.
Follow serials, authors, keywords & more
Public profiles for Economics researchers
Various research rankings in Economics
RePEc Genealogy
Who was a student of whom, using RePEc
Curated articles & papers on economics topics
Upload your paper to be listed on RePEc and IDEAS
New papers by email
Subscribe to new additions to RePEc
EconAcademics
Blog aggregator for economics research
Cases of plagiarism in Economics
About RePEc
Initiative for open bibliographies in Economics
News about RePEc
Questions about IDEAS and RePEc
RePEc volunteers
Participating archives
Publishers indexing in RePEc
Privacy statement
Found an error or omission?
Opportunities to help RePEc
Get papers listed
Have your research listed on RePEc
Open a RePEc archive
Have your institution's/publisher's output listed on RePEc
Get RePEc data
Use data assembled by RePEc
An official website of the United States government
The .gov means itâs official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure youâre on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Publications
- Account settings
Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .
- Advanced Search
- Journal List
- Springer Nature - PMC COVID-19 Collection
New York City taxi trip duration prediction using MLP and XGBoost
1 College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar
Mohit Malviya
2 Department of CTO 5G, Wipro Limited, Bengaluru, India
Chahat Kumar
3 CFO Technology, Enterprise Risk Function Technology, Bank of America, Chennai, India
Mounir Hamdi
V vijayakumar.
4 University of South Wales, Sydney, Australia
Jamel Nebhen
5 College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, Alkharj, 11942 Saudi Arabia
Hasan Alyamani
6 Department of Information Systems, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Rabigh, 21911 Saudi Arabia
New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to travel from one place to another. Given the rising popularity of app-based taxi usage through common vendors like Ola and Uber, competitive pricing has to be offered to ensure users choose them. Prediction of duration and price of trips can help users to plan their trips properly, thus keeping potential margins for traffic congestions. It can also help drivers to determine the correct route which in-turn will take lesser time as accordingly. Moreover, the transparency about pricing and trip duration will help to attract users at times when popular taxi app-based vendor services apply surge fares. Thus in this research study, we used real-time data which customers would provide at the start of a ride, or while booking a ride to predict the duration and fare. This data includes pickup and drop-off point coordinates, the distance of the trip, start time, number of passengers, and a rate code belonging to the different classes of cabs available such that the rate applied is based on a regular or airport basis. Hereafter, we applied XGBoost and Multi-Layer Perceptron models to find out which one of them provides better accuracy and relationships between real-time variables. At last, a comparison of the two mentioned algorithms facilitates us to decide that XGBoost is more fitter and efficient than Multi-Layer Perceptron for taxi trip duration-based predictions.
Introduction
Earth is filled with an enormous population that tends to move from one place to another. Advancement in technologies had led to different ways of transportation. These include buses, autos and especially taxi services. New York City is one of the highly advanced cities of the world with extensive use of taxi services. Along with a vast population, the requirement of commonly available transportation serves the common purpose as it provides a very large transportation system. New York facilitates one of the largest subway systems in the world and comprises various green and yellow cabs which approximately count of around 13,000 taxis. Most of the population of New York depends upon public transport, and it has been estimated that 54 percent of the people do not own a car or a personal vehicle. As a matter of fact, it accounts for almost 200 million taxi trips per year.
The dataset we have used is available at Kaggle live, and its related information was collected over the years along with certain dependencies and provided to the public for further analysis. We used a collection of these datasets, which depicted around 3 years of NYC taxi trip data - about 15 lakhs records were considered, which carried the information of Taxi trip from January 2017 to January 2020.
Considering various Machine Learning models that provide reliable and improved accuracy for prediction-based use-cases, XGBoost and MLP are taken into consideration due to their novel potentiality to accumulate complex component conditions. Successful prediction of the taxi trip duration would eventually be much useful in the future to make better taxi trip duration predictions applicable to multiple cities.
XGBoost is short for “Extreme Gradient Boosting” which comes in association with various ensemble learning algorithms. It represents a flexible type of implementation where-in the concepts of decision trees (Gupta et al. 2020 ) get wholly acknowledged. Moreover, it is found to be much faster when compared to more common algorithms like Adaboost.
Further, it has recently dominated the machine learning world and gotten much attention in Kaggle competitions. Execution speed and Performance (Qureshi et al. 2020 ) are the two essential factors of using this algorithm in our work.
Multilayer Perceptron (MLP)
A Perceptron is considered as a linear classifier that produces a single output based on several linear functions. A multilayer perceptron (MLP) is a class of feedforward artificial neural network Sharma et al. ( 2020 ) which forms the basis for a deep learning platform. It encapsulates a deep artificial neural network that comprises more than one perceptron. This Artificial Neural Network mainly consists of nodes that use a non-linear activation function. It uses a backpropagation algorithm which gets classified under Supervised learning (Butgereit and Martinus 2019 ) methods. Non-linear data can also be separated using the multiple layers and the non-linear activation of the MLP, which makes it quite distinguishable (Kabán 2019 ) from a linear perceptron.
Thus, the contribution of this paper is as follows:
- Since the duration of the taxi trip is highly dependent on the time at which the trip is made, the prediction becomes highly complex. In this regard, we have taken into account the time of the trip for reliable predictions. Also, we have excluded co-ordinates of locations present outside New York City because of their outlying nature. Using XGBoost equipped with K-Means clustering and given specific location, date, and time variables, we then analyzed and estimated the ride duration using real-time data which gets collected from various taxis.
- The Multi-layer Perceptron model is used further to determine the relationship between various real-time labels and identities as taken from the data associated with different taxis.
- Comparison between XGBoost and Multi-Layer Perceptron models is later done to determine which one of them comes off suitable and reliable for the mentioned New York City Taxi Prediction.
The remainder of the paper is organized as follows. In Sect. 2 , we discussed the Related Work. New York City Taxi Duration Dataset Description is discussed in Sect. 3 followed by Methodology in Sect. 4 . Simulation results and performance evaluation are provided in Sect. 5 before we summarize our concluding remarks in Sect. 6 .
Related work
We studied a variety of different research works in the topics of Neural Networks, Multi-layer Perceptron, Bagging and Boosting, and other ML algorithms like AdaBoost and XGBoost for prediction-based methods. We tried to understand the methodology and workflow of each algorithm and how it would be beneficial to our project. The analysis of the research papers helped us to gain a number of possible insights, advantages, and disadvantages of the algorithms which could potentially provide the best solution for our problem statement. Based on the analysis, we reached a conclusion on how to work on the mentioned New York City Taxi Prediction use-case.
We started with (Ran et al. 2020 ) where speed and traffic stream were taken into account as the contribution to the model. The maximum places acquired by the K-means++ model and calibrations acquired by the XGBoost model are utilized to find out the Euclidean distance(ED). The base estimation of the calculated values gets utilized as the prescient estimation of the congestion level caused by different vehicles. As indicated by the forecast trial of I15-N interstate traffic information in PeMS information base, the joined model outstripped different models and the prescient exactness of the consolidated model came up at 94.47%. Further, (Liao et al. 2019 ) was considered where-in a load anticipating procedure dependent on XGBoost along with comparative days was proposed. This mechanism was used to break down the basic meteorological laws and everyday types based on the heap load. The XGBoost algorithm with the loss function and Taylor extension were added to the different quantitative terms to control the unnecessary fitting and intricacy. The charge-based and temperature-based information in a specific territory was completely taken as different sets of the test. The conclusions provided that the proposed XGBoost model can anticipate the heap-based load quite adequately.
To add more, (Wang et al. 2020 ) presented a driving conduct wellbeing assessment SVM-based mechanism which separated out the different values of distributive features to get the ideal order of hyperplane and afterward utilized the mathematical stretch as the assessment list for driving conduct wellbeing. Simultaneously, driving conduct, crowd-based streets, proficiency, sparing of energy, and climate factors with various other loads were considered, and thereafter it partitioned various driving conduct in four types: Good, Normal, Above the threshold, and Unfit level in view of SVM and KMeans. Subsequently, the XGBoost inherent mechanism was utilized. The test inferred the normal precision of 99.21% and the normal review rate of 98.5% which eventually demonstrated the whole operation was truly viable and attainable. To comprehend the innovations in XGBoost technology, (Cao et al. 2020 ) threw light upon a momentary traffic stream forecast model. This technique was dependent on best and worst inclination rise such that the analysis results uncovered the predominance of the whole system by contrasting it with the previous anticipation model.
Moreover, (Yang et al. 2020 ) was put into consideration as it reflected LC choice procedure that enabled vehicles based on autonomous ability to settle on human-like choices. This technique joined the XGBoost algorithm alongside a profound autoencoder (DAE) network-based technology. Initially, an autoencoder gets used to assemble a strong multi-component reformation structure utilizing time arrangement information from a different category of sensors. Thereafter, the recreated log errors pertaining to the DAE get prepared with other primary and secondary information, and as such, the whole process gets examined for LCID. Thus, the preparation of information extraction was made accordingly and at this point, to address the non-symmetric and multifactorial issue of the LC dynamic cycle, a Bayesian boundary enhancement with an XGBoost calculation came into the effect. In the interim, to completely prepare the learning model with a huge scope of data information sets, a proposal of a web-based preparing methodology was furnished to refresh the model boundaries with information clusters. The exploratory outcomes delineated that the given model can precisely distinguish the LC conduct of vehicles. Moreover, when information of similar parameters was added, the whole structure accomplished preferable execution over other mainstream techniques.
In order to understand the holistic environment of XGBoost, (Montiel et al. 2020 ) was taken which introduced a transformation of XGB for characterization of developing information-based varied streams. Here, new information gets shown up over the long haul and the connection between the highlights and varied-classes was getting changed simultaneously. This technique made new individuals of the entity based on ensembling as new entry points which later gets opened up as set by the required changes. The greatest group size was allowed to be fixed, yet the process of learning various features didn’t get stopped in light of the fact that the model was refreshed on new information to guarantee uniformity with the latest ideas. Likewise, an investigation of the utilization of drift concept identification was done to activate a component so as to refresh the group. Testing of the technique on manufactured information with drift identification was made available and later, it was differentiated against other methods of classification for information streams. The results proved out to have a powerful impact produced by the proposed idea over other previous methodology used.
To familiar with the knowledge of Multi-Layer perceptron, (Ayyappa et al. 2020 ) was used where-in a computerized Tumor recognition procedure was proposed which helped various doctors in recognizing cerebrum tumors. Here, a solidarity MLP based Gaussian Filtering alongside BP Neural Network was evaluated which delivered good precise outcomes while distinguishing the cerebrum tumor with an exactness pace of 93% when contrasted with different classification methods like SVM and PNN. Likewise, (Sunindyo and Satria 2020 ) investigated out the likelihood to utilize the CCTV film so as to perform anticipation based on regular traffic-data. The recording was prepared consequently utilizing detection and tracking of the object-based procedures to get adequate traffic information points. From that point forward, the information dedicated to traffic entry points (Suresh et al. 2021 ) was demonstrated by encompassing both LSTM and MLP. The efficiency of the whole structure was estimated by utilizing RMSE which in-turn provided accurate high-level information from the given data. This investigation demonstrated that prepared CCTV film is in fact a practical alternative for gridlock expectation. The best model accomplished 1.88 RMSE by measures of vehicles, transports, and trucks as an anticipated variable with a fortified MLP strategy.
To enhance the idea of MLP, (Khamees et al. 2020 ) was utilized to find out another methodology for preparing the MLP in light of the crow-search streamlining mechanism. The primary target of this methodology was to diminish varied shortcomings to its base level and increment the pace of the classifying process. The marked threshold of the given execution was accomplished by fabricating distinct typical datasets for the process of classification. As such, it was also done to guarantee that the nature of the outcome remains high, and additionally, this mechanism was later contrasted with other classification algorithms, for example, ACO, GA and PSO. The results showed up that the search based on crow streamline calculation was most accurate as it delivered the most elevated precision rate and tackled the improvement and optimization issue effectively.
Hereafter, (Wu et al. 2019 ) was availed to acknowledge another compounded variable choice mechanism for non-symmetric MLP process. The provided operation used some garrote-based conceptualization on non-negative numerical values to pack the different weights pertaining to the MLP structure. Weights that provided zero subordinate factors as input were taken out from the underlying information. Then, a factor determination was done by using optimization calculation which got carried out on extremal parameters. The new factor choice calculation was then coordinated out which combined a great determination capacity dedicated to NNG and the exact nearby capacity of EO. Lastly, two instances of informational collections and a modern debutanizer application were actualized to show the efficiency of the new structure. The outcome exhibited that the created approach presented a much greater execution alongside the variable which provided fewer input data than the other variable decision strategies.
While the prediction-based algorithms becomes certainly important, Irio et al. ( 2021 ) suggested a model which transformed the directions information of the vehicles dependent on successive areas related to GPS and built an ethical-measurable surmising algorithm which in-turn was utilized in accordance with the portability expectation at an online level. Here, the surmising algorithm was dependent on Markov based secret model (HMM) such that every direction got demonstrated in terms of subset based on discrete/continuous areas. Besides, the forecast model utilized various measurable data construed up until this point and subordinated extensively on the calculation of the Viterbi mechanism that recognized the provided multiple subsets rooted on discrete/continuous areas. Along with it, the most extreme probability of numerous earlier subsets-based areas was supported to establish valuable prediction means. Additionally, a hybrid deep neural network prediction model was proposed by (Duan et al. 2019 ) which majorly proceeded on the idea of convolutional LSTM (ConvLSTM) techniques. Moreover, multiple certain connections between OD’s stream and movement’s time were investigated which later was joined for the contributions of the forecast algorithm. It also presented a lattice and street-settled technique to address ODs streams forecast around numerable street-based network degrees and tackled different issues that can’t help in recognizing stream-related traffics by using grid-based representation at various statures.
In addition to above, (Zhang et al. 2020 ) exhibited a learning-model based on various parallel tasks such that it contained three equalized-parallel layers of LSTM for co-foreseeing pickup and drop-off taxi requests. It also helped in contrasting multiple exhibitions of expectation procedures related to single interest and co-forecast strategy requests associated with two interest-based parameters. Exploratory outcomes on provided datasets showed the imperative and extensive dependence of pickup and drop-off requests upon one another which in-turn delivered solidarity governing adequacy based on the suggested co-forecasting strategies. Furthermore, (Kankanamge et al. 2019 ) utilized the sophisticated idea of gathering several taxi time-based travel directions connected with static parameters. It then involved isolated-based XGBoost models with respect to regression conditions alongside the above-mentioned data. Here, a bunch of extraordinary molded excursions and distinguished inlier were discretely differentiated with the use of prevailing leading algorithms. This permitted to furnish of the impressive prediction techniques of the XGB-IN prototype such that it produced less root mean squared error and mean absolute error in accordance with the real-world time travel figures. Further, it also facilitated to provide models based on XGB-Extreme mechanisms which gave sensibly precise expectation outcomes to a bundle of maximal-configured journeys accompanied by limited real-time taxi rides.
Consequently, (Maddikunta et al. 2020 ) investigated a robust ML linked random forest regression model towards the prediction of IoT gadgets-based battery life. As accordingly, a few techniques related to the data pre-processing like dimensionality reduction, normalization, and transformation were utilized for the model which in-turn attained a predictive exactness of about 97% across all the various scenarios. It was also demonstrated that the evaluated model gained better performance in sustaining the battery life of IoT gadgets as compared to existing state-of-art regression-based algorithms.
A better understanding of the methodology useful for the prediction can be provided using (Poongodi et al. 2020a ) where it employed maximum likelihood estimation to formulate the probabilities using the Logistic Regression Model. Here, an iterative-based regression algorithm was set to take place on all of the classes such that at least each of them was counted for various prediction structures. Later, (Poongodi et al. 2020b ) was studied which encompassed a Decentralized Autonomous Organization (DAO) to create a wholly sustainable and tidy community predictive development throughout the real-time world settings. Accompanying the use of the ML algorithms, (Poongodi et al. 2020c ) enhanced and improved the predictive monetary situation of all individuals connected officially with the different clusters of establishments and businesses by utilizing a model in-together which included various ML algorithms such as Hierarchical clustering, Decision tree, KNN clustering, etc. Extending different ideas, [24-25] reused or retransformed Linear SVM technology by using the prediction of any two given observations rather than the observations themselves. This accompanied to provide better and superior results for their researched use-cases. A predictive-based recommendation system was used in (Poongodi et al. 2019 ) where-in complex and normalized XGBoost Algorithms were used for the user credibility parameters. A number of factors based on the purchase and review history of the users were taken into consideration to develop a smooth and flexible prediction recommendation system.
In order to explore more about prediction operations, (Alazab et al. 2020 ) extended the use-case of the smart grid CPS mechanism by incorporating various schemas coupled with the Multidirectional Long Short-Term Memory (MLSTM) technique. This was done in order to allow the accurate prediction of the smart grid network stability matrices. Comparison between the existing best Deep Learning methods like RNN Guo et al. ( 2020 ), GRU, conventional LSTM, etc., and the suggested MLSTM procedure showed that the latter outflanks (Kashif et al. 2020 ) various other ML prediction-based models. At last, (Muhammad et al. 2021 ) was chosen which applied multiple supervised ML algorithms like SVM, naive Bayes, CNN, RNN, logistic regression, decision tree, etc. on epidemiology-based real-world labeled Coronavirus dataset so as to detect COVID-19 disease. A major part of the procedure was carried out to clean the data which benefited to find out strong correlations between independent and dependent features of the chosen dataset. Based on the critical analysis of various ML approaches, it was found that the decision tree model accomplished the best accuracy of 94.99% in comparison to other techniques.
Thus after careful analysis, we discovered several miscellaneous and mixed drawbacks in the variegated models that were hybridly used in the prediction mechanisms. Supervised Machine Learning models such as Decision tree and random forest classification/regression were found to be superior to others in terms of their sensitivity, specificity, and accuracy due to which the idea of using XGBoost is taken further for the New York City Taxi Prediction use-case. Moreover, the presence of using K-means clustering with XGBoost Model (Tang et al. 2020 ) over the rest of the Unsupervised ML techniques was noticed because of its convergeable, scalable and adaptable properties. Subsequently, the employment of the Multi-Layer Perceptron is involved in the second part of this research paper since it turned out to provide higher heteroskedasticity and an added advantage of solving complex and non-linear problems. Following the standards of the neural network, MLP based models aid to deduce hidden interconnections within the real-time multiplex datasets (Tang et al. 2020 ) which eventually supports in making out efficient and improved methods (Chinmay and Rodrigues Joel 2020 ) for the mentioned taxi prediction application.
Dataset description
New York City Taxi Duration dataset is taken from the Kaggle website which provides free access to complex challenges. This dataset helps us to predict the trip duration of a taxi ride taking into account the different factors that affect the ride duration. Along with the above-mentioned, one more dataset gets included which involves the climatic conditions of the city. Both of these datasets are combined using pre-processing techniques to create a single dataset that can be used further for accurate trip duration prediction. Some of the important attributes of the dataset are discussed below:
- id , which provides a unique identification to a trip.
- vendor id , a unique code which gets assigned to the different cab companies.
- pickup datetime , starting statistics of the pickup.
- dropoff datetime , ending statistics of the pickup.
- passenger count , passengers travelling in a particular trip.
- pickup longitude , longitudinal location of the pickup.
- pickup latitude , latitudinal location of the pickup.
- dropoff longitude , longitudinal location of the drop off.
- dropoff latitude , latitudinal location of the drop off.
- store and fwd flag , a code to identify whether the data is stored on the device and then gets forwarded to the database.
- trip duration , the total time of the trip in seconds.
The second dataset comprises the climatic data of the city which includes vital information such as the time of rainfall, sunlight, and various other factors which can be used for better prediction of the taxi trip.
Proposed methodology
Our kernel is written and developed using iPython Notebook and XGBoost model with the assistance of a mini-batch K-means clustering algorithm. The workflow of the kernel includes the following steps:
- First of all, importation of all the necessary libraries including Sklearn library is done.
- Both the dataset gets imported accordingly in order to analyse the various attributes of the taxi trip duration.
- Mathematical values such as standard mean, variance and quartiles of all the features is then find out to gather multiple parameters. While calculating various constraints, careful attempts to avoid any type of mismatch gets regularly checked as needed.
- Thereafter, the Mini batch clustering gets utilised which provides highly susceptible to outliers. Cleaning of the data to remove the outliers is accordingly employed so that the above algorithm starts to work efficiently.
- The cleaned data is then analysed deeply for more feature extraction by finding out the correlation in the data which ensures maximum coverage.
- Computation of three different distances i.e. manhattan, haversine and bearing distance between the pickup and drop off location gets evaluated. Manhattan distance gives the straight line distance between the specified coordinates. But since the earth is round and taking into account the straight line distance is like neglecting an important aspect of the route, so as a result, Haversine Distance gets employed extensively. Moreover, Bearing direction is used to calculate the angular distance between various point of interest.
- Hereafter, the average of all the three distances is calculated and added to the cleaned dataset as extracted features which in-turn gets further used for critical analysis.
- Next, Mini-batch K-means algorithm is applied to cluster points on basis of the pick-up latitude, pick-up longitude, drop off latitude and drop off longitude variables. Later, the clusters obtained are used to find out their centres and subsequently, the trips are divided according to above-mentioned clusters parameters. Additionally, these area-based clusters are added as an extra feature to the dataset.
- As a result, addition of about 200 features gets accomplished in the form of cluster centres. Here, the added features mainly includes 100 pick up and 100 drop off clusters points.
- Finally, the redundant columns are removed and the associated back-bone of the kernel i.e. XGBoost model gets applied to the dataset with the added parameters. Henceforth, several results are observed for the taxi-based prediction values.
A similar methodology for multi-layer perceptron is also followed which includes importing libraries and datasets (incorporating external data for improving accuracy), pre-processing the imported datasets, and so on. Rectified Neural Networks are then applied to eliminate outliers appropriately. At last, the application of linear neural networks is performed to get the desired results.
Results and discussion
As shown in Fig. 1 , we plot a simple histogram of the trip duration by throwing the data into 100 bins. Binning involves taking the data’s maximum and minimum points, subtracting them to get the length, dividing the calculated length by the number of bins to get the interval length, and finally grouping the data points into mentioned intervals.
Number of training records vs Trip duration
Further, a Gaussian curve-based graph as shown in Fig. 2 gets plotted which aids to determine an insightful relationship between various taxi trips and the logarithm of trips duration. This also provides an intuitive pattern understanding of how taxi services work in New York City.
Logarithmic Trip duration
It is very important for us to find out whether the training and testing data are in agreement with each other or not. By this, we mean that we need to calculate the said parameters using a Time series graph that tell us how well are the number of trips over time-varying parameters in accordance with the training and testing dataset. As a result, we simply plot a time-series line graph of both the test and training data to not only look into identifying possible trends but to also see if both datasets follow the same pattern shape which is seen in Fig. 3 .
Comparison of Training and Testing Datasets
Next, we utilize the New York City map border coordinates in the kernel to create the canvas where-in the coordinate points get suitably graphed. Here, a simple scatter plot is precisely used to display the actual coordinates. It helps to show whether the pick-up points in the training and testing datasets overlap each other in some manner or not. This gets shown in Fig. 4 .
Comparison of pickup and dropoff points on the map of New York
After that, we plot three different graphs for average speed of a taxi based on different hours of a day, different days of a week and different months in the year. This is shown in Fig. 5 .
Average speeds
Finally, we visualise the feature importance graph as seen in Fig. 6 to see which features amongst all are most relevant and required for getting accurate results.
Feature Importance Graph
Successively, we run the XGBoost algorithm with the parameters shown below. Mentioned parameters can be changed as desired but before setting them out, one must study about XGBoost documentations as it greatly helps in understanding about how to fine-tune the parameters for better performance and efficiency. Accordingly, the features included are:
- max depth = 6
- learning rate = 0.09
- iteration = 250
After running the algorithm, we get to infer that the average RMSE value over 250 iterations is about 0.39 for the training dataset and 0.44 for the testing dataset.
Similarly, we employ the Multi-Layer Perceptron model on a similar dataset. It essentially requires a deep learning setup using Rectifier to eliminate outliers from the data. As such, results are shown in Fig. 7 .
Results of MLP Algorithm
The training accuracy of this algorithm is observed to be around 0.2740, while the testing accuracy sets out near 0.41. This precisely shows that XGBoost is slightly better than MLP model.
We are successfully able to implement both of the algorithms on the New York City Taxi Trip Duration dataset and able to draw certain conclusions from several inferences. After implementing both of the algorithms, we come across that XGBoost is better than MLP as it shows a slightly good accuracy than the latter one. This in turn helps to conclude that XGBoost Model is more efficient and reliable in predicting the taxi trip duration as compared to MLP.
As a part of the future work, the Multi-layer Perceptron model could be auto-tuned to further learn and determine which features need to get joined to detect numerous interactions between them as needed. Moreover, variabilities and quantities related to the various location features might also be computed in the upcoming research in order to localize the traffic-based effects on the taxi prediction coordinates. Speed limitations-based features could later be incorporated alongside to comprehend better analysis of the datasets. Further, New York Central Park and the associated weather conditions could also be closely taken care of as New Yorkers might take a taxi when they are near Central Park or when the weather condition is severe, but not when they are near Central Park and it is raining, since they may not visit the park in bad weather. At last, enhancements to the K-Means Clustering algorithm could be provided by encompassing additional features such as distance to the closest metro station, number of bars and eateries in a given zone, etc. so as to exploit comparative qualities belonging to various zones. This would also ensure the rightful evaluation of various clusters in which each data point falls such that it fills in as an extra vital element for our models.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Contributor Information
M Poongodi, Email: [email protected] .
Mohit Malviya, Email: [email protected] .
Chahat Kumar, Email: moc.liamg@7003ramuktahahc .
Mounir Hamdi, Email: aq.ude.ukbh@idmahm .
V Vijayakumar, Email: [email protected] .
Jamel Nebhen, Email: [email protected] .
Hasan Alyamani, Email: as.ude.uak@inamaylajh .
- Alazab M, Khan S, Krishnan SSR, Pham Q, Reddy MPK, Gadekallu TR. A Multidirectional LSTM Model for Predicting the Stability of a Smart Grid. IEEE Access. 2020; 8 :85454â85463. doi:Â 10.1109/ACCESS.2020.2991067. [ CrossRef ] [ Google Scholar ]
- Almathami Hassan Khader Y, Win Khin Than, Vlahu-Gjorgievska Elena. Barriers and facilitators that influence telemedicine-based, real-time, online consultation at patients’ homes: systematic literature review’ J Med Internet Res. 2020; 22 (2):16407. doi:Â 10.2196/16407. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
- Ayyappa Y, Bekkanti A, Krishna A, Neelakanteswara P, Basha C (2020) “Enhanced and Effective Computerized Multi Layered Perceptron based Back Propagation Brain Tumor Detection with Gaussian Filtering”, (2020) Second International Conference on Inventive Research in Computing Applications (ICIRCA). July, p, Coimbatore, India
- Butgereit L, Martinus L (2019) “A Comparison of Four Open Source Multi-Layer Perceptrons for Neural Network Neophytes”, In: 2019 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD). Winterton, South Africa
- Cao J, Cen G, Cen Y, Ma W (2020) “Short-Term Highway Traffic Flow Forecasting Based on XGBoost”, In: 2020 15th International Conference on Computer Science & Education (ICCSE). Delft, Netherlands
- Chinmay C, Rodrigues Joel JPC. A comprehensive review on device-to-device communication paradigm: trends, challenges and applications. Wireless Personal Commun. 2020; 114 (1):185â207. doi:Â 10.1007/s11277-020-07358-3. [ CrossRef ] [ Google Scholar ]
- Duan Zongtao, Zhang Kai, Chen Zhe, Liu Zhiyuan, Tang Lei, Yang Yun, Ni Yuanyuan. Prediction of city-scale dynamic taxi origin-destination flows using a hybrid deep neural network combined with travel time. IEEE Access. 2019; 7 :127816â127832. doi:Â 10.1109/ACCESS.2019.2939902. [ CrossRef ] [ Google Scholar ]
- Guo Z, Shen Y, Bashir AK, Imran M, Kumar N, Zhang D, Yu K (2020) Robust spammer detection using collaborative neural network in internet of thing applications. IEEE Internet of Things J 1–1. 10.1109/JIOT.2020.3003802
- Gupta A, Sharma S, Goyal S, Rashid M (2020) “Novel XGBoost Tuned Machine Learning Model for Software Bug Prediction’, 2020 International Conference on Intelligent Engineering and Management (ICIEM). United Kingdom, London
- Irio L, Ip A, Oliveira R, Luís M. An adaptive learning-based approach for vehicle mobility prediction. IEEE Access. 2021; 9 :13671â13682. doi:Â 10.1109/ACCESS.2021.3052071. [ CrossRef ] [ Google Scholar ]
- Jeyachandran A, Poongodi M (2018) Securing Cloud information with the use of bastion algorithm to enhance confidentiality and protection. Int J Pure Appl Math 118(24)
- Kabán Ata (2019) “Compressive Learning of Multi-layer Perceptrons: An Error Analysis”, In: 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary
- Kankanamge KD, Witharanage YR, Withanage CS, Hansini M, Lakmal D, Thayasivam U (2019) “Taxi trip travel time prediction with isolated XGBoost Regression”, In: 2019 Moratuwa Engineering Research Conference (MERCon). Moratuwa, Sri Lanka, pp. 54–59
- Kashif BA, Suleman K, Rabadevi B, Deepa N, Alnumay WS, Gadekallu TR, Maddikunta PKR (2020) “Comparative analysis of machine learning algorithms for prediction of smart grid stability”, Int Trans Electr Energy Syst, Feb
- Khamees M, Ahmed WS, Abbas SQ (2020) “Train the Multi-Layer Perceptrons Based on Crow Search Algorithm”, In: 2020 1st. Information Technology To Enhance e-learning and Other Application (IT-ELA), Baghdad, Iraq, July
- Koo J, Faseeh QNM, Siddiqui IF, Abbas A, Bashir AK. IoT-enabled directed acyclic graph in spark cluster. J Cloud Comput. 2020; 9 (1):1â5. doi:Â 10.1186/s13677-020-00195-6. [ CrossRef ] [ Google Scholar ]
- Liao X, Cao N, Li M, Kang X (2019) “Research on Short-Term Load Forecasting Using XGBoost Based on Similar Days”, In: 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). Changsha, China
- Maddikunta PKR, Srivastava G, Gadekallu TR, Deepa N, Boopathy P. Predictive model for battery life in IoT networks. IET Intel Transport Syst. 2020; 14 (11):1388â1395. doi:Â 10.1049/iet-its.2020.0009. [ CrossRef ] [ Google Scholar ]
- Montiel J, Mitchell R, Frank E, Pfahringer B, Abdessalem T, Bifet A (2020) “Adaptive XGBoost for Evolving Data Streams”, In: 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, United Kingdom
- Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chinmay C, Mohammed IA (2021) Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput Sci [ PMC free article ] [ PubMed ]
- Poongodi M, Ashutosh Sharma, Vijayakumar V, Vaibhav Bhardwaj, Parkash Sharma Abhinav, Razi Iqbal, Rajiv Kumar. Prediction of the price of Ethereum blockchain cryptocurrency in an industrial finance system. Comput Electr Eng. 2020; 81 :106527. doi:Â 10.1016/j.compeleceng.2019.106527. [ CrossRef ] [ Google Scholar ]
- Poongodi M, Vijayakumar V, Chilamkurti N. Bitcoin price prediction using ARIMA model. Int J Int Technol Secured Trans. 2020; 10 (4):396â406. doi:Â 10.1504/IJITST.2020.108130. [ CrossRef ] [ Google Scholar ]
- Poongodi M, Vijayakumar V, Rawal B, Bhardwaj V, Agarwal T, Jain A, Ramanathan L, Sriram VP. Recommendation model based on trust relations & user credibility. J Intell Fuzzy Syst. 2019; 36 (5):4057â4064. doi:Â 10.3233/JIFS-169966. [ CrossRef ] [ Google Scholar ]
- Poongodi M, Hamdi M, Vijayakumar V, Rawal BS (2020b) and ”, 2020 IEEE 3rd 5G World Forum (5GWF). Bangalore, India, pp 1–6
- Ran D, Jiaxin H, Yuzhe H (2020) “Application of a Combined Model based on K-means++ and XGBoost in Traffic Congestion Prediction”, In: 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA). Zhangjiajie, China
- Sharma R, Schommer C, Vivarelli N (2020) “Building up Explainability in Multi-layer Perceptrons for Credit Risk Modeling”, In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). Australia, Sydney
- Sunindyo WD, Satria ASM (2020) “Traffic Congestion Prediction Using Multi-Layer Perceptrons And Long Short-Term Memory”, In: 2020 10th Electrical Power. Electronics, Communications, Controls and Informatics Seminar (EECCIS), Malang, Indonesia
- Suresh P, Sundresan P, Mujahid T, Ganthan N, Chinmay C, Saju M, Zeeshan B, Mohammad TQ (2021) ANN base novel approach to detect node failure in wireless sensor network, CMC-Computers. Tech Science Press, Materials & Continua
- Tang Q, Xia G, Zhang X, Long F (2020) “A Customer Churn Prediction Model Based on XGBoost and MLP”, In: 2020 International Conference on Computer Engineering and Application (ICCEA). Guangzhou, China
- Wang X, Lou XY, Hu SY, He SC (2020) “Evaluation of Safe Driving Behavior ofTransport Vehicles Based on K-SVM-XGBoost”, In: 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). Shenzhen, China
- Wu X, Li Y, Wu H, Zhang F, Sun K (2019) “A hybrid variable selection algorithm for multi-layer perceptron with nonnegative garrote and extremal optimization”, In: 2019 19th International Conference on Control, Automation and Systems (ICCAS). Jeju, Korea (South)
- Yang B, He Y, Liu H, Chen Y, Ji Z (2020) “A Lightweight Fault Localization Approach based on XGBoost”, 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). Macau, China
- Zhang C, Zhu F, Wang X, Sun L, Tang H, Lv Y (2020) Taxi demand prediction using parallel multi-task learning model. IEEE Trans Intell Trans Syst 1–10. 10.1109/TITS.2020.3015542
Predicting New York Taxi Trip Duration Based on Regression Analysis Using ML and Time Series Forecasting Using DL
- Conference paper
- First Online: 23 August 2022
- Cite this conference paper
- S. Ramani 13 ,
- Anish Ghiya 13 ,
- Pusuluri Sidhartha Aravind 13 ,
- Marimuthu Karuppiah 14 &
- Danilo Pelusi 15 Â
Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 458))
534 Accesses
The taxi fare and the duration of a trip are highly dependent on many factors such as traffic along route or late-night drives, which might be a little slower due to restricted night vision and many more. In this research work, it is attempted to visualize the various factors that might affect the trip durations such as day of the week, pickup location, drop-off location and time of pickup. The research work mainly analyses the dataset obtained from the NYC Taxi and Limousine Commission (TLC) which contains the data of taxi trips from January 2016 to June 2016 with GPS coordinates. The analysis of the data is performed, and the prediction of the taxi trip duration is done using multiple machine learning and deep learning models. The analysis is done for these models based on the mean squared error and the R2 score that is found without scaling and performing scaling on the data. The maximum \(R^2\) score was attained with the recurrent neural network (RNN) using time series analysis with a score of 0.99 and 0.97 with XGBRegressor , and an increment of 0.6% was observed with normalizing value using log transform while analysing it as a regression perspective.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
- Available as PDF
- Read on any device
- Instant download
- Own it forever
- Available as EPUB and PDF
- Compact, lightweight edition
- Dispatched in 3 to 5 business days
- Free shipping worldwide - see info
Tax calculation will be finalised at checkout
Purchases are for personal use only
Institutional subscriptions
U. Patel, A. Chandan, NYC taxi trip and fare data analytics using BigData, in Analyzing Taxi Data Using Bigdata (2015)
Google Scholar Â
S. Rong, Z. Bao-wen, The research of regression model in machine learning field, in MATEC Web of Conferences , vol. 176, pp. 01033. EDP Sciences (2018)
Z. TurĂłczy, L. Marian, Multiple regression analysis of performance indicators in the ceramic industry. Procedia Econ. Finan. 3 , 509â514 (2012)
Article  Google Scholar Â
J.G. De Gooijer, R.J. Hyndman, 25 years of time series forecasting. Int. J. Forecast. 22 (3), 443â473 (2006)
P. Montero-Manso, G. Athanasopoulos, R.J. Hyndman, T.S. Talagala, FFORMA: feature-based forecast model averaging. Int. J. Forecast. 36 (1), 86â92 (2020)
S. Makridakis, E. Spiliotis, V. Assimakopoulos, Statistical and machine learning forecasting methods: concerns and ways forward. PloS One 13 (3), e0194889 (2018)
R. Madan, P.S. Mangipudi, Predicting computer network traffic: a time series forecasting approach using DWT, ARIMA and RNN. in 2018 Eleventh International Conference on Contemporary Computing (IC3) , pp. 1â5. IEEE (2018)
S. Nihale, S. Sharma, L. Parashar, U. Singh, Network traffic prediction using long short-term memory, in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) , pp. 338â343. IEEE (2020)
T. Shelatkar, S. Tondale, S. Yadav, S. Ahir, Web traffic time series forecasting using ARIMA and LSTM RNN, in ITM Web of Conferences , vol. 32, pp. 03017. EDP Sciences (2020)
J. Sola, J. Sevilla, Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 44 (3), 1464â1468 (1997)
F.E.N.G. Changyong, W.A.N.G. Hongyue, L.U. Naiji, C.H.E.N. Tian, H.E. Hua, L.U. Ying, Log-transformation and its implications for data analysis. Shanghai Archiv. Psychiat. 26 (2), 105 (2014)
S. Du, M. Pandey, C. Xing, Modeling Approaches for Time Series Forecasting and Anomaly Detection (ArXiv, Stanford, 2017)
M. Abdoos, A.L. Bazzan, Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory. Expert Syst. Appl. 171 , 114580 (2021)
https://www.kaggle.com/c/nyc-taxi-trip-duration/data . Last Accessed 4 Oct 2021
https://www.kaggle.com/oscarleo/new-york-city-taxi-with-osrm . Last Accessed 4 Oct 2021
https://www.kaggle.com/mathijs/weather-data-in-new-york-city-2016 . Last Accessed 4 Oct 2021
Download references
Author information
Authors and affiliations.
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India
S. Ramani, Anish Ghiya & Pusuluri Sidhartha Aravind
Department of Computer Science and Engineering, SRM Institute of Science and Technology, Delhi-NCR Campus, Ghaziabad, Uttar Pradesh, 201204, India
Marimuthu Karuppiah
Faculty of Communications Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi
You can also search for this author in PubMed  Google Scholar
Corresponding author
Correspondence to Danilo Pelusi .
Editor information
Editors and affiliations.
Gnanmani College of Engineering and Technology, Namakkal, India
Jennifer S. Raj
Department of Computer Science, Kennesaw State University, Kennesaw, GA, USA
Faculty of Communication Sciences, University of Teramo, Teramo, Italy
Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania
Valentina Emilia Balas
Rights and permissions
Reprints and permissions
Copyright information
Š 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper.
Ramani, S., Ghiya, A., Aravind, P.S., Karuppiah, M., Pelusi, D. (2022). Predicting New York Taxi Trip Duration Based on Regression Analysis Using ML and Time Series Forecasting Using DL. In: Raj, J.S., Shi, Y., Pelusi, D., Balas, V.E. (eds) Intelligent Sustainable Systems. Lecture Notes in Networks and Systems, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-19-2894-9_2
Download citation
DOI : https://doi.org/10.1007/978-981-19-2894-9_2
Published : 23 August 2022
Publisher Name : Springer, Singapore
Print ISBN : 978-981-19-2893-2
Online ISBN : 978-981-19-2894-9
eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Publish with us
Policies and ethics
- Find a journal
- Track your research
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
- We're Hiring!
- Help Center
TRIP DURATION PREDICTION: NEW YORK TAXI RIDES USING XGBoost (NYC Taxi Trip Duration Dataset
â Predicting a trip duration isn't something that has not been though upon. With the use of Google maps API one can find the estimated time it would take to move between two points in the city. However, a detailed analysis of the factors affecting a trip between two points in a city can be very useful for accurate and robust prediction. Trip duration is not as simple as it seems. It is data dependent and is governed by a lot many factors apart from distance and speed. This research primarily focuses on the possible important factors that are used as attributes for the trip duration prediction in the New York City. This data can be used by taxi vendors for better services to the users. The research work not only uses a prediction model but also gives an in-depth analysis of the factors associated with the New York City taxi trips. A city like New York is expected to have various factors and variations with respect to the trip durations. The dataset used for training and testing purposes in multi-dimensional and requires a lot of pre-processing. This research work involves application of relevant machine learning algorithms such as linear regression, random forests, lasso regression and XGBoost algorithms for completion of the task. The final algorithm used in this research work is XGBoost algorithm as it yields the best result when compared with other methods employed for the same task. A root mean square error of 0.4409 was achieved when the test data that consisted of about 600000 data points were given as an input to the training model.
Related Papers
Vol. 19 No. 2 FEBRUARY 2021 International Journal of Computer Science and Information Security (IJCSIS)
Journal of Computer Science IJCSIS
Travel time plays a crucial role in the intelligent transport system in metropolitan cities. Predicting accurate Taxi trip travel time helps commuters to plan their trip better and reach the destination on time. Most of the existing techniques use supervised learning models to estimate the travel time. Performance obtained from the supervised learning models is insufficient. In this paper, we propose a novel approach that aims at predicting travel time by using both supervised and unsupervised techniques with a large historic dataset, and this novel method is compared with supervised techniques. The clustering approach of un-supervised learning along with supervised helps to enhance the performance of a predictive model. Clustering helps in segmenting the nearby location data into a similar group which helps in finding the underlying pattern within the large dataset. Then, a supervised algorithm is applied to this clustered data. Machine Learning (ML) techniques such as Random Forest Regressor (RFR), XGBoost Regressor (XGBR), which are supervised and RFR with k-means, XGBR with k-means which combines both supervised and unsuper-vised techniques are used to predict the trip time of the taxi trips. The results show that a combination of supervised and unsupervised models perform better than only supervised models. Also, the comparison shows that the RFR and RFR with k-means perform better than XGBR and XGBR with k-means respectively. RFR with k-means outper-forms other models with an accuracy of 84.6%. With better performance, RFR with k-means also reduces the error rate of the model significantly.
IRJET Journal
Taxi demand prediction is the process of using historical data to forecast future taxi requests in a particular area. Managers may pre-allocate taxi resources in cities with the aid of accurate and real-time demand forecasting, which helps drivers find clients more quickly and cuts down on passenger waiting times. This project is aimed to choose the best model in predicting the taxi demand where we use various Machine learning techniques such as regression analysis and time series forecasting. Various baseline models, including moving averages (simple, weighted, and exponential), linear regression with grid search, random forest regressor with random search, and XGBoost regressor with random search are used. We find out which model is more suitable in predicting the output using the metrics we obtain.
Dillip Rout
International Journal of Scientific Research in Computer Science, Engineering and Information Technology
International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT
Accurately predicting the travel time between two destinations is an essential aspect of traffic monitoring and facilitating ridesharing services. However, this is a highly complex and challenging task, which involves a multitude of variables that cannot be resolved straightforwardly. Previous studies on travel time prediction have focused on evaluating the duration of individual road segments or specific sub-paths before integrating the necessary time for each sub-path. While this method may provide some insight, it may result in an incorrect or imprecise time estimate. To address this issue, this research aims to utilize machine learning techniques to predict the duration of trips in ride-sharing networks, by utilizing the Uber movement dataset. The proposed system employs Python programming to calculate the distance between the pickup and drop-off locations. Furthermore, the study explores the various factors that affect travel time in a descriptive analysis. This includes examining the impact of traffic congestion, weather conditions, and road construction on travel time. The suggested approach incorporates a robust regression model known as Huber regression to enhance the accuracy of trip duration prediction and increase the precision of the algorithm. The Huber regression model is robust to outliers, making it suitable for the Uber movement dataset, which may contain unexpected and extreme values. The dataset is processed using k-fold cross-validation, which splits the dataset into k subsets, with each subset used for validation once while the remaining subsets used for training the model. However, this approach presents several challenges that need to be addressed, including the difficulties with tracking variables, the need for extensive data transformation due to the diverse data types contained in the dataset, and the challenge of handling unlabeled places during the segmentation of geographical data. Additionally, outliers in the dataset can lead to substantial data differences and affect the model's accuracy. Data normalization is slow due to the time-consuming nature of reading duplicated information. To mitigate these issues, additional study is required to improve the model's layout and address the challenges of working with the Uber movement dataset.
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
IJRASET Publication
Taxi plays a crucial role in transportation especially in urban areas.Predicting the future demand for taxis in particular geographical location will greatly help internet based transportation companies like Ola, Uber etc. So that we can drastically decrease the waiting time of customers/passengers and also it helps taxi drivers to move to particular location where demand is high eventually making passengers,drivers and companies happy. In this Project we like to predict the demand for taxi in particular location for next 10 min using previous time series data .we want to perform this task of regression using machine learning models with high accuracy and then we would like to apply deep learning models and compare the results.we like to propose the best suited and high accuracy model for the problem.It will greatly help companies in managing the taxi fleet in cities.
â Land Transportation Sector is one of the key sectors in the Philippine economy particularly in Metro Manila. With the rapid urbanization of the Philippines, the urban transport infrastructure is expected to experience pressures posing a major risk of urban transport degradation resulting into longer travel times, economic and productivity losses. In light of this, the Land Transportation Franchising and Regulatory Board (LTFRB) along with DOST-ASTI has initiated a project on implementing a bus management system for Public Utility Vehicles utilizing real time GPS location data. This study takes on establishing a travel time prediction for the buses given a specific route. The travel time estimation was performed using Extremely Randomize Trees, a supervised machine learning algorithm. The resulting prediction set had a correlation of determination score indicative of a good predictive performance for travel time prediction.
arXiv (Cornell University)
Human-centric intelligent systems
Prof. Arnab K. Laha
This research aims to study the predictive analysis, which is a method of analysis in Machine Learning. Many companies like Ola, Uber etc uses Artificial Intelligence and machine learning technologies to find the solution of accurate fare prediction problem. We are proposing this paper after comparative analysis of algorithms like regression and classification, which are useful for prediction modeling to get the most accurate value. This research will be helpful to those, who are involved in fare forecasting. In previous era, the fare was only dependent on distance, but with the enhancement in technologies the cabâs fare is dependent on a lot of factors like time, location, number of passengers, traffic, number of hours, base fare etc. The study is based on Supervised learning whose one application is prediction, in machine learning.
Marco A. Casanova
This paper investigates the application of a Machine Learning technique to predict the time that will be spent by a vehicle between any two points in an approximated area. The prediction is based on a learning process based on historical data about the movements performed by the vehicles taking into account a set of semantic variables to get estimated time
RELATED PAPERS
sabri sabri
arXiv: General Mathematics
omar AJEBBAR
Journal of Pharmaceutical Research International
Aarthi Muthukumar
Regiane Ribeiro
Annals of the Rheumatic Diseases
Sven Remstedt
Journal of Orthopaedic Surgery and Research
Xavier Peirau
Journal of Eastern Mediterranean Archaeology and Heritage Studies
Ann E. Killebrew
konfo christian
JosĂŠ MuĂąoa Blas
Journal of Cleaner Production
Hamad Al-Turaif
Journal of Biological Chemistry
Karen Rosenspire
DĂMF MĂźhendislik Dergisi
Ms. Ujala Ehsan
Journal of Neuroinflammation
Alex Rovira
Proceedings of Singapore Healthcare
Joyce Joseph
ĺśä˝mcmasteĺŚä˝čŻäšŚ 麌ĺ 銏ćŻçšĺ¤§ĺŚćŻä¸čŻĺŚä˝čŻäšŚć ˇćżćŻä¸čŻčŽ¤čŻĺçä¸ć¨Ąä¸ć ˇ
Journal of Experimental Biology
Emanuel Andrada
The Journal of Urology
safwan jaradeh
International Journal of Engineering Applied Sciences and Technology
Bhavika Batra
RAD Conference Proceedings
RELATED TOPICS
- We're Hiring!
- Help Center
- Find new research papers in:
- Health Sciences
- Earth Sciences
- Cognitive Science
- Mathematics
- Computer Science
- Academia ©2024
NYC taxi drivers could get charged full congestion pricing toll
S ome New York City taxi drivers could get hit by the full congestion pricing charge if one of the cityâs two taxi technology companies doesnât play ball with the MTA.
Curb, one of the taxi-tech firms that runs credit card readers for taximeters in the city, is attempting to charge the MTA a service fee in order to add a per-trip congestion toll on top of any taxi trips within the congestion pricing zone, the Daily News has learned.
The MTAâs congestion pricing plan , which will charge a base toll of $15 per day to cars entering Manhattan at 60th St. or below, includes a carve-out for taxis and other for-hire vehicles .
Rather than charge taxi drivers the $15 toll, the congestion pricing plan will assess a per-trip surcharge for trips through the congestion zone, passing the toll on to passengers.
The surcharge will be $1.25 for taxis and $2.50 for Uber and Lyft trips.
But that scheme requires the app and meter companies to sign an agreement with the MTA to render that surcharge to the transit agency, according to a letter sent by the Taxi and Limousine Commission Tuesday to medallion owners, livery base operators, and others in the industry.
â[Ubers and Lyfts] dispatched by a base that has not entered into the agreement, as well as taxis and [green cabs] utilizing a [meter company] that has not entered into the agreement, may be charged a $15 toll up to once per day paid by the vehicle owner, instead of the per-trip charge paid by the passenger, for entering the Congestion Relief Zone,â the TLC letter reads.
Meter companies âmust agree to collect and remit the per-trip charge in order for the per-trip charge to apply to trips completed in vehicles that are equipped with that [company]âs technology system.â
Taxi and transit sources told The News Wednesday that Curb has been haggling with the MTA, trying to assess a service fee for sending the congestion toll to the transit agency.
A spokesman for Curb rejected the idea the company was âhaggling.â
Curb spokesman Zak Hawke on Thursday said the firm was cooperating with the MTA and engaged in what he called âa typical reviewâ of the agreement.
â[T]here is a significant cost associated with the technology, manpower and infrastructure needed to collect and reconcile the mandated Congestion Toll Zones fees on behalf of the MTA,â he said.
âIt is reasonable for these costs to be acknowledged and for Curb to be compensated for these efforts, either now or in the future.â
Curb is committed to an on-time implementation of congestion pricing tolls, the spokesman said.
MTA chairman Janno Lieber characterized pending agreements with meter companies and livery bases to be a âpaperworkâ issue.
âWe need folks who are going to be responsible for the interactions with the MTA and the collection of the money to have agreed with the process,â Lieber said.
Lieber did not name Curb or any other firm by name when asked about the situation Wednesday.
âThe drivers themselves are the ones weâre trying to help,â he said. âWe just donât want them to get stuck if the company that is responsible for processing this â for their own reasons â does not execute an agreement.â
âMaybe some [drivers] will choose to move to another meter company,â he added.
Asked if the MTA would be open to a nominal fee, Lieber said no.
âI donât think thatâs called for,â he said. âThere are a lot of taxes and fees that are already collected through these meters.â
Curb is one of two major players in the cityâs taxi-meter industry. Itâs preferred by owner-operator drivers over rival system Arro, owned by CMT Group, which is predominantly used by larger taxi fleets.
Sources said roughly 75% of taxis in the city have meters run by Curb.
A TLC spokesman Wednesday said that the city agency was not part of any arrangement between livery bases or meter companies and the MTA, but reiterated the TLC commissioners support of a per-trip charge.
âHopefully MTA and these providers can come to an agreement that serves everyone,â TLC spokesman Jason Kersten said in a statement.
Bâhairavi Desai, head of the New York Taxi Workers Alliance, said that while she doubted cabbies would be hit with the full $15, she worried that Curb or other providers might try to pass any fee the MTA rejected onto drivers instead.
Desai told The News that cabbies should be able to pay the MTA themselves.
âIf Curb or Arro are going to present a problem, let the medallion owners pay directly,â she said.
Congestion pricing, which is currently facing legal challenges in New York and New Jersey, is expected to begin on June 30 .
Š2024 New York Daily News. Visit nydailynews.com. Distributed by Tribune Content Agency, LLC.
Watch CBS News
NYC's best and worst times to travel on Memorial Day Weekend
By Jesse Zanger
Updated on: May 24, 2024 / 12:55 PM EDT / CBS New York
NEW YORK - If you're going to hit the road this Memorial Day Weekend for the unofficial start of summer, you're not alone.
Holiday travel is already underway in and around New York City ahead of the long weekend.
AAA projects nearly 44 million people will be traveling more than 50 miles from May 23 through 27. That's a 4% hike from last year and, for the first time, will exceed pre-pandemic levels.
"We're projecting an additional one million travelers this holiday weekend compared to 2019, which not only means that we're moving beyond pandemic-era lulls, but also signals a very busy summer travel season ahead," Alec Slatky, of AAA Northeast, said.
That means a record amount of road trips are expected. AAA estimates 38.4 million people will travel by car, the highest number for the holiday ever recorded since AAA began tracking in 2000.
Some 3.5 million people will be flying this week, a 4.8% hike over last year.
A record 6.4 million people are expected to use Port Authority airports, bridges and tunnels, the Port Authority said. Passengers on domestic flights should arrive at the airport at least two hours in advance, and for international travel, at least three hours is recommended. Here are some air travel tips from the Port Authority.
Here are the worst and best times to hit the road, according to transportation analytics company INRIX:
Thursday, May 23 : Worst travel time 12 - 6 p.m. | Best travel time before 11 a.m., after 7 p.m.
Friday, May 24 : 12 - 6 p.m. | Before 11 a.m., after 7 p.m.
Saturday, May 25 : 2 - 5 p.m. | Before 1 p.m., after 6 p.m.
Sunday, May 26 : 3 - 7 p.m. | Before 1 p.m.
Monday May 27 : 3 - 7 p.m. | After 7 p.m.
AAA says booking data shows New York City is the #3 domestic travel destination for the holiday weekend.
Busiest NYC bridges and tunnels for Memorial Day Weekend
Here's a closer look at last year's data on how the various bridges and tunnels stack up, in terms of volume, over the weekend.
*% change from 2019 and 2013 is calculated using only Brooklyn-bound Verrazzano traffic from 2023, to account for the implementation of two-way tolling in 2020.
The busiest time to travel, according the AAA is expected to be Thursday and Friday afternoons, when commuters and travelers will both be on the roads.
A closer look at the Verrazano, Throgs Neck + Whitestone Bridges
Verrazano Bridge
- Brooklyn-bound: The busiest time, other than the Thursday and Friday morning commutes, is expected to be Sunday in the late-afternoon and evening and Monday evening, according to AAA. In 2023, there were more than 6,500 cars per hour from 1 - 8 p.m. Sunday and 5 - 8 p.m. Monday
- Staten Island-bound: Thursday evening and Friday late afternoon. Last year, there were more than 8,000 cars per hour from 4 - 7 p.m. Thursday and 3 - 5 p.m. Friday.
Throgs Neck + Whitestone Bridges
- Bronx-bound: The busiest times are Saturday morning and Sunday morning. In 2023, there were more than 8,000 cars per hour from 9 a.m. - 1p.m. Saturday and 11 a.m. - 1 p.m. Sunday.
- Queens-bound: The busiest times are Thursday evening and Sunday evening. Last year, there were more than 9,000 cars per hour from 4 - 7 p.m. Thursday and more than 8,700 cars per hour from 3 - 7 p.m. Sunday.
PATH train service
PATH Trains will operate on a Saturday schedule on Memorial Day.
- Memorial Day
- Port Authority
Jesse Zanger is managing editor of CBS New York. Jesse has previously worked for the Fox News Channel and Spectrum News NY1. He covers regional news around the Tri-State Area, with a particular focus on breaking news and extreme weather.
Featured Local Savings
More from cbs news.
Hundreds of flights delayed at NYC airports on Memorial Day
Memorial Day parades march through NYC and surrounding area
What's open and closed for Memorial Day around the Tri-State Area
Red Alert: Severe thunderstorms for Memorial Day travel around NYC
New York City congestion pricing, first in the nation, is approved at $15 and up for vehicles
A majority of the MTA board voted Wednesday in favor of New York City congestion pricing , green-lighting the controversial plan that will charge cars $15 to enter Manhattan below 61st Street and hit trucks with even higher tolls starting in just a few months.
Only one of the 12 board members opposed the proposal. The no vote was Nassau County board member David Mack.
The approval, essentially a rubber stamp of âclarificationsâ like exemptions, given the plan itself was approved last year, means congestion pricing can begin following a 60-day public information campaign and a concurrent 30-day testing period.
Read more from NBC New York
- 3 plead to $4.5M NJ romance scam that left victims âbroke and heartbroken'
- Donald Trump plans to attend slain NYPD officer's wake: police spokesperson
- NYC solar eclipse to perfectly coincide with Yankees' April 8 home game
Almost all 110 toll readers are already installed, positioning the MTA to begin collecting as soon as June 15. Federal judges on either side of the Hudson River could still block the plan, though the MTA expects that not to be the case.
The board overwhelmingly voted in favor of the plan in December, saying charging drivers to enter a swath of Manhattan would contribute millions of dollars to the aging, cash-strapped transit system. Wednesdayâs vote is a critical final approval of âclarificationsâ and exemptions.
As NBC New York reported earlier this week, most of the cars likely to get full exemptions will be government vehicles. Get details on the planned exemption list here.
The toll will not be in effect for taxis, but drivers will be charged a $1.25 surcharge per ride. The same policy applies to Uber, Lyft and other rideshare drivers, though their surcharge will be $2.50.
Despite what MTA officials say were overwhelming public comments âin favorâ of congestion pricing by a 2-to-1 margin, a number of groups have stood in opposition.
Taxi advocates have blasted the plan, calling it âa reckless proposal that will devastate an entire workforce.â
Public hearings earlier in March paved the way for Wednesdayâs vote. For its part, the MTA has insisted that it is merely implementing a state law aimed at cleaning the air and modernizing mass transit.
How does congestion pricing work?
Congestion pricing will impact any driver entering what is being called the Central Business District (CBD), which stretches from 60th Street in Manhattan and below, all the way down to the southern tip of the Financial District. In other words, most drivers entering midtown Manhattan or below will have to pay the toll, according to the boardâs report.
All drivers of cars, trucks, motorcycles and other vehicles would be charged the toll. Different vehicles will be charged different amounts â hereâs a breakdown of the prices:
Passenger vehicles: $15
Small trucks (like box trucks, moving vans, etc.): $24
Large trucks: $36
Motorcycles: $7.50
The $15 toll is about a midway point between previously reported possibilities, which have ranged from $9 to $23.
The full, daytime rates will be in effect from 5 a.m. until 9 p.m. each weekday, and 9 a.m. until 9 p.m. on the weekends. The board called for toll rates in the off-hours (from 9 p.m.-5 a.m. on weekdays, and 9 p.m. until 9 a.m. on weekends) to be about 75% less â about $3.50 instead of $15 for a passenger vehicle.
Drivers will only be charged to enter the zone, not to leave it or stay in it. That means residents who enter the CBD and circle their block to look for parking wonât be charged.
Only one toll will be levied per day â so anyone who enters the area, then leaves and returns, will still only be charged the toll once for that day.
The review board said that implementing their congestion pricing plan is expected to reduce the number of vehicles entering the area by 17%. That would equate to 153,000 fewer cars in that large portion of Manhattan. They also predicted that the plan would generate $15 billion, a cash influx that could be used to modernize subways and buses.
Can I get a discount?
Many groups had been hoping to get exemptions, but very few will avoid having to pay the toll entirely. That small group is limited to specialized government vehicles (like snowplows) and emergency vehicles.
Low-income drivers who earn less than $50,000 a year can apply to pay half the price on the daytime toll, but only after  the first 10 trips in a month.
While not an exemption, there will also be so-called âcrossing creditsâ for drivers using any of the four tunnels to get into Manhattan. That means those who already pay at the Lincoln or Holland Tunnel, for example, will not pay the full congestion fee. The credit amounts to $5 per ride for passenger vehicles, $2.50 for motorcycles, $12 for small trucks and $20 for large trucks.
Drivers from Long Island and Queens using the Queens-Midtown Tunnel will get the same break, as will those using the Brooklyn-Battery Tunnel. Those who come over the George Washington Bridge and go south of 60th Street would see no such discount, however.
Public-sector employees (teachers, police, firefighters, transit workers, etc.), those who live in the so-called CBD, utility companies, those with medical appointments in the area and those who drive electric vehicles had all been hoping to get be granted an exemption. They didnât get one.
UFT President Michael Mulgrew, one of the lead plaintiffs in a federal lawsuit again congestion pricing, said following the MTA approval that now itâs the courtsâ job to step in.
âNow that the MTA board has voted, it is going to be up to the courts to prevent the huge environmental injustice that threatens families outside the Manhattan congestion zone, including communities that are already suffering some of the worst air pollution and asthma rates in the country,â Mulgrew said.
Andrew Siff is a reporter for NBC New York.Â
Street Wars
The Battle for the Streets of New York
Now more than ever, the city is being forced to rethink how its thoroughfares are used.
By Dodai Stewart Illustrations by Leon Edler
New York City streets and sidewalks have always been crowded, but itâs never been like this.
There are more people, more cars and more bicycles. And thatâs not all.
Dining sheds are squeezed beside bike lanes. Home delivery has exploded, ushering in more e-bikes, cargo bikes and trucks.
Itâs all crammed into streets laid out over 200 years ago. The result? A chaotic struggle for space unlike any the city has ever seen.
Sign up for Street Wars. A weekly series about the battle for space on New Yorkâs streets and sidewalks. Youâll also receive local reporting on the stories that define the city, via our daily newsletter, New York Today. Get it sent to your inbox.
On a recent morning, the intersection of East 77th Street and Lexington Avenue presented a vivid illustration of the tumult.
A taxi trying to make a left turn had to maneuver around a Verizon crew digging up the asphalt. A box truck was parked in the bus lane, and the M102 bus, with its accordionlike belly, was forced to change lanes and snake around it.
Dozens of people streamed out of the subway and into the crosswalk. A man pushing a double stroller navigated between the subway entrance and a sidewalk compost box. A womanâs shopping cart wheels got stuck in a crack in the sidewalk. CitiBikes and delivery bikes whizzed by. A cargo bike stopped in front of a FedEx truck that was unloading packages next to a bike lane.
Lively, energetic streets make city living attractive â people to watch, windows to browse, benches to sit on, trees for shade.
But lately, New York City streets are teetering between lively and unlivable. Residents clash over traffic, noise, parking, 5G towers and heaps of trash. Most years, far fewer pedestrians get killed by motorists than in generations past, but last year was the deadliest year for cyclists since 1999 .
Still, people who have thought deeply about the state of the cityâs streets believe dramatic improvement may be on the way â if New York is willing to seize the moment.
Thatâs because the city is about to embark on the nationâs first congestion pricing plan, charging most drivers $15 to enter much of Manhattan below 60th Street â and forcing many commuters to find a different way into the city.
The aim is to reduce car traffic in one of the worldâs busiest commercial districts and raise money for public transportation.
People, bikes and vehicles compete for space on New York Cityâs streets.
Karsten Moran for The New York Times
âI think this could be the catalyst for a streets renaissance in New York,â Janette Sadik-Khan, New York Cityâs former transportation commissioner, said in a recent interview.
âWe have to talk about how weâre going to reclaim that space and make it work for people.â
Of course, congestion pricing, too, comes with a fight. The plan is supposed to start in June, but it faces several lawsuits brought by elected officials and residents from across the region, who describe it as ill-conceived and unfair to commuters who drive because public transit isnât robust enough to serve their needs.
âThey donât drive because they want to,â said Susan Lee, a member of a coalition called New Yorkers Against Congestion Pricing. âThey donât want to sit in traffic.â
Could congestion pricing actually reduce the number of cars in the city to a dramatic extent? If so, what would take their place?
There are other ideas and experiments in the works for taming New Yorkâs streets, and they raise questions of their own. Could a proposal to ban parking close to intersections improve public safety? Will the Sanitation Departmentâs garbage containerization plan make sidewalks cleaner? Is there a way to keep package delivery trucks from blocking the streets? Must 5G technology create public eyesores in residential neighborhoods?
In the months ahead, The New York Times will examine the debates raging in neighborhoods all over the city about who and what gets to take up space on New Yorkâs streets and sidewalks.
How did we get here?
Orchestrating the flow of traffic and pedestrians has been a complicated and emotional project for centuries.
New York Cityâs streets were laid out before anyone knew how they would ultimately be used â long before cars were even invented. The first city planners could not have anticipated Uber vehicles, let alone Amazon deliveries or commuters on electric scooters.
In New Yorkâs earliest days, the streets were a free-for-all. People walked or rode horses. There were no crosswalks or stoplights; if you had to cross the street, you simply walked across the street.
Traffic on Broadway in 1859 consisted of pedestrians, horse-drawn carts and streetcars.
William Notman, via Getty Images
Soon, horse-drawn vehicles used the streets alongside pedestrians , and people dashed between them. (Later, New Yorkers dodged streetcars in much the same way, giving the Brooklyn baseball team its name.)
The arrival of bicycles neatly encapsulated the cityâs ever-shifting debate over how the streets should be used â and by whom.
By the 1890s, the streets were full of bikes. Men and women took to cycling through the city so quickly â and dangerously â that it was called âscorching.â
About 100 years later, in 1987, speeding bike messengers were deemed so dangerous that bicycles were banned from Midtown â temporarily .
Today, the city encourages residents and visitors to ride bikes. New York has bike lanes and a flourishing bike share program, plus an explosion of food delivery powered by e-bikes. The renewed popularity has also come at a grave cost: Last year 30 cyclists were killed on city streets, and 395 were severely injured.
âItâs hard to say whether itâs the best of times or the worst of times for bicycling,â said Jon Orcutt, the director of advocacy at Bike New York and the former policy director at the cityâs Department of Transportation. âMore people are doing it than ever.â
âIf youâre not killed â squished like a bug â you can bike across town in 10 minutes,â he added. âItâs easy. Itâs really efficient.â
Enter the car â and the car crash
On the evening of Sept. 13, 1899, Henry Hale Bliss, a 69-year-old real estate broker, was riding a Manhattan streetcar on his commute home.
At 74th Street and Central Park West, Mr. Bliss stepped from the streetcar and into the street, where he was immediately hit by a taxi. He died on the scene and is recognized as the first person in the United States to be killed by a car . There is a plaque at the intersection commemorating his death.
âAt the end of the Gilded Age, right before World War I, suddenly, there were motor vehicles everywhere,â said James Nevius, an author and New York historian.
The development meant people could move around faster â but it also put more people in danger.
In 1920, there were about 200,000 registered vehicles in New York City; by 1925 that number had more than doubled. A century later, that figure is two million.
This scene of Park Avenue near 57th Street was typical of 1930s traffic. Over 10 million cars went through the Holland tunnel in 1930.
George Rinhart/Corbis, via Getty Images
And yet New Yorkers are still using the same streets that were laid out generations ago. In Manhattan, the rigid street grid was designed in 1811. Avenues are 100 feet across. Cross streets are 60 feet wide, including the space for sidewalks on both sides.
Thatâs 720 inches in which to fit not just cars but also pedestrians, baby strollers, trash, compost, scaffolding, bicycles, e-bikes, scooters, skateboards, package delivery trolleys, garbage trucks, delivery trucks, food carts, 5G towers, dining sheds, trees, CitiBike docks, buses, taxis, ambulances and on-street parking.
Itâs like a giant game of Tetris â except all the pieces just wonât fit.
In fact, some of the pieces are growing larger: In the past decade, the average vehicle got 12 percent longer and 17 percent wider . (Carsâ blind spots have also gotten larger .)
And the number of pieces just keeps expanding. New York Cityâs population reached 8.8 million in 2020, and the New York region is now home to nearly 19 million people. The cityâs population has dropped some in the past few years, but city officials believe that recent population estimates have significantly underestimated the number of newly arrived migrants, which, by some counts, is over 180,000 .
Taming the streets
Even as New Yorkâs streets and sidewalks have become more chaotic, there are also plenty of examples of the opposite: moments when the city has tamed the traffic and found new uses for its old spaces.
Over the past 10 to 15 years, sweeping pedestrian plaza initiatives â detouring cars and encouraging space for sitting and strolling â have gradually changed the landscape, from the Jackson Heights neighborhood in Queens to Times Square .
Times Square was once full of traffic. In May 2009, the city closed Broadway to cars and set out lawn chairs, the start of the areaâs transformation to pedestrian plaza.
Damon Winter/The New York Times
The Open Streets program restored pedestrian-first streets, free of cars and safe enough for strolling, chatting and letting kids ride bikes.
The coronavirus pandemic ushered in a chance to rethink public spaces, and the absolute quiet on the streets during lockdown was a reminder that the city isnât inherently noisy, but traffic is.
And there are plenty of other places to look for inspiration: In BogotĂĄ, Stockholm , London and Paris, certain streets are being closed to cars . There is an effort in Europe to avoid the oversize pickup trucks and SUVs that make American roads so deadly. Paris has designated âschool streetsâ where cars have been removed to make way for children . Cycling is flourishing in Europe; emissions are down .
In New York, Ms. Sadik-Khan, the former transportation commissioner, is among the people thinking deeply about the future of streets â and she is optimistic.
âThereâs a new generation of New Yorkers whoâve never known a city without protected bike lanes and bike share,â Ms. Sadik-Khan said. âMore people than ever are working from home. Commuting patterns are in flux. Thereâs the opportunity to make a new deal for people getting around.â
What will a ânew dealâ look like? And will New Yorkers be on board?
No matter what happens, change doesnât come without a fight â and many of the battles will be fought street by street and block by block.
Over the next few months, we will take a close look at some of these street fights â and weâre eager to hear about yours, too.
Use this form to tell us what you think about the state of New York Cityâs streets.
Food Delivery Workers, Overlooked in Life, Are Honored in Death
The Great Gotham Vroom Boom of 2020
Can Congestion Pricing Alter New Yorkâs Car Culture?
Congestion Pricingâs Impact on New York? These 3 Cities Offer a Glimpse.
New York as a Biking City? It Could Happen. And It Should.
- Share full article
Advertisement
IMAGES
VIDEO
COMMENTS
The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data. For-Hire Vehicle ("FHV") trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record ...
The Kaggle competition named "New York City Taxi Trip Duration" consists of the 2016 NYC Yellow Cab trip record data, which was originally published by the NYC Taxi and Limousine Commission (TLC). This competition demands us to build a model that predicts the total ride duration of taxi trips in New York City. Thus, the problem statement is ...
Share code and data to improve ride time predictions. Share code and data to improve ride time predictions. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. No Active Events. Create notebooks and keep track of their status here. add New Notebook. auto_awesome ...
The mean difference between predicted and actual duration is -739.25, i.e., a model based on yellow taxis predicts almost a ~12-minute lesser travel duration. One reason for the lower travel time ...
New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to travel from one place to another ...
This is a comprehensive Exploratory Data Analysis for the New York City Taxi Trip Duration competition with Python and Data Visualization libraries such as matplotlib and seaborn. I also use New York City Taxi with OSRM to support the primary dataset.. The goal of this playground challenge is to predict the duration of taxi rides in NYC based on features like trip coordinates or pickup date ...
In this competition, Kaggle is challenging you to build a model that predicts the total ride duration of taxi trips in New York City. Your primary dataset is one released by the NYC Taxi and Limousine Commission, which includes pickup time, geo-coordinates, number of passengers, and several other variables. Longtime Kagglers will recognize that ...
Payment Etiquette: Tips and Tolls. In New York City, tipping your taxi driver is customary and appreciated as a gesture of thanks for good service. A tip of 20% of the fare is standard, though you may choose to tip more for exceptional service or convenience. For rides that require tolls, the tolls are typically added to the final fare, and ...
Developed various models to predict the total ride duration of taxi trips in New York City đž Data Description The dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform.
Yes, but only if the trip is more than 12 hours long, or if their 'taxi' light is off. 12 hour+ journeys are against the law in the US, and only taxis with their lights on are currently working. If you're staying far out of the city centre, perhaps get in the cab before telling them where you're going.
Scatterplot of all pickups and dropoffs in New York City Summary. This post explores a subset of the NYC taxi dataset for the month of April 2013. I extract, transform and load the trip fare and trip details csv files into a sqlite database. I use this data to predict the fare and tip taxi drivers will receive.
What a taxi costs. I remember not too long ago (or so it seems) when taking taxis in New York a decent distance would cost $5 to $7. Those days are long gone. The base fare is $2.50 with 50 cents added every 1/5 of a mile or 60 seconds of slow traffic or stop time.
New York City Taxi Trip Duration Prediction Using Machine Learning. May 2023. DOI: 10.22214/ijraset.2023.52768. Authors: Nandeshvar R K. Dr. Janaki K. Avin Joseph. K Sakthivel. Show all 5 authors.
Downloadable (with restrictions)! New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to ...
There is no visible relation between trip duration and passenger count. Trip Duration per hour sns.lineplot(x='pickup_hour',y='trip_duration',data=data) We see the trip duration is the maximum around 3 pm which may be because of traffic on the roads. Trip duration is the lowest around 6 am as streets may not be busy.
Since the duration of the taxi trip is highly dependent on the time at which the trip is made, the prediction becomes highly complex. In this regard, we have taken into account the time of the trip for reliable predictions. Also, we have excluded co-ordinates of locations present outside New York City because of their outlying nature.
trip_duration: (target) duration of the trip in seconds Thus we have a data set with 729322 rows and 11 columns. There are 10 features and 1 target variable which is trip_duration
Poongodi et al. (Poongodi et al., 2021) presented a trained XGBoost model that was able to predict the taxi trip durations having an RMSE value of 0.39, and concluded that XGBoost performed better ...
The average speed of a taxi in New York City is about 11 km/hour. The data has several data points with a speed way beyond that. We will now have a look at the distribution of the distance ...
If route A is X kilometres longer, but gets you there, Y minutes faster than route B would, one would take route B over A. New York City Taxi and Limousine Commission (TLC) deals with the licencing of taxicabs operated by the private companies in New York along with overseeing about 40,000 other for-hire vehicles.
The mean difference between predicted and actual duration is -739.25 i.e. a model based on yellow taxis predicts almost a ~12 minute lesser travel duration. One reason for the lower travel time in ...
Dataset- New York City Taxi Duration Dataset Dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform. The training set (contains 1458644 trip records) and the testing set (contains 625134 trip records).
Some New York City taxi drivers could get hit by the full congestion pricing charge if one of the city's two taxi technology companies doesn't play ball with the MTA. Curb, one of the taxi ...
Thursday, May 23: Worst travel time 12 - 6 p.m. | Best travel time before 11 a.m., ... AAA says booking data shows New York City is the #3 domestic travel destination for the holiday weekend.
Small trucks (like box trucks, moving vans, etc.): $24. Large trucks: $36. Motorcycles: $7.50. The $15 toll is about a midway point between previously reported possibilities, which have ranged ...
Explore and run machine learning code with Kaggle Notebooks | Using data from New York City Taxi Trip Duration
In 1920, there were about 200,000 registered vehicles in New York City; by 1925 that number had more than doubled. A century later, that figure is two million. This scene of Park Avenue near 57th ...