NYC

  • Pilot Programs

TLC Trip Record Data

  • Request Data

Print icon

Yellow and green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. The data used in the attached datasets were collected and provided to the NYC Taxi and Limousine Commission (TLC) by technology providers authorized under the Taxicab & Livery Passenger Enhancement Programs (TPEP/LPEP). The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data.

For-Hire Vehicle (“FHV”) trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record submissions made by bases. Note: The TLC publishes base trip record data as submitted by the bases, and we cannot guarantee or confirm their accuracy or completeness. Therefore, this may not represent the total amount of trips dispatched by all TLC-licensed bases. The TLC performs routine reviews of the records and takes enforcement actions when necessary to ensure, to the extent possible, complete and accurate information.

On 05/13/2022, we are making the following changes to trip record files:

  • All files will be stored in the PARQUET format. Please see the ‘Working With PARQUET Format’ under the Data Dictionaries and MetaData section.
  • Trip data will be published monthly (with two months delay) instead of bi-annually.
  • HVFHV files will now include 17 more columns (please see High Volume FHV Trips Dictionary for details). Additional columns will be added to the old files as well. The earliest date to include additional columns: February 2019.
  • Yellow trip data will now include 1 additional column (‘airport_fee’, please see Yellow Trips Dictionary for details). The additional column will be added to the old files as well. The earliest date to include the additional column: January 2011.

Due to COVID-19 and its impact on the daily operations of small businesses, TLC granted smaller bases an extension on trip record submissions. Trip and trip-related data for these bases will be updated as it becomes available.

Data Dictionaries and MetaData

  • Trip Record User Guide
  • Yellow Trips Data Dictionary
  • Green Trips Data Dictionary
  • FHV Trips Data Dictionary
  • High Volume FHV Trips Data Dictionary
  • Working With PARQUET Format

Taxi Zone Maps and Lookup Tables

  • Taxi Zone Lookup Table (CSV)
  • Taxi Zone Shapefile (PARQUET)
  • Taxi Zone Map – Bronx (JPG)
  • Taxi Zone Map – Brooklyn (JPG)
  • Taxi Zone Map – Manhattan (JPG)
  • Taxi Zone Map – Queens (JPG)
  • Taxi Zone Map – Staten Island (JPG)
  • 09/08/2017 - FHV trip record files from June 2017 updated as of 09/08/2017
  • 08/30/2017 - FHV trip record files from July 2016 through June 2017 updated as of 08/16/2017
  • 03/13/2017 - FHV trip record files from January 2016 through December 2016 updated as of 02/14/2017
  • 09/22/2015 - TPEP and LPEP trip data PARQUETs from January through June 2015 have been updated to include a new field [improvement_surcharge] which lists the itemized portion of the fare covering the Taxicab Improvement Surcharge or Street Hail Livery Improvement Surcharge. This is a $0.30 surcharge on all trips to help fund accessibility in taxis and SHLs, which began on January 1, 2015. All TPEP and LPEP trip data files uploaded moving forward will also include this new field.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications You must be signed in to change notification settings

Build a model that predicts the total ride duration of taxi trips in New York City

dsankush/NYC-Taxi-Trip-Time-Prediction

Folders and files, repository files navigation.

new york city taxi trip duration

AlmaBetter Verfied Project - AlmaBetter School

new york city taxi trip duration

NYC Taxi Trip Time Prediction

Developed various models to predict the total ride duration of taxi trips in New York City

💾 Data Description

The dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform. The data was originally published by the NYC Taxi and Limousine Commission (TLC). The data was sampled and cleaned for the purposes of this project. Based on individual trip attributes, you should predict the duration of each trip in the test set.

NYC Taxi Data.csv - the training set (contains 1458644 trip records)

💾 Project Files Description

This Project includes 1 executable files, 3 text files as well as 1 directories as follows:

Executable Files:

  • NYC_Taxi_Trip_Time_Prediction_Capstone_Project.ipynb - Includes all functions required for Regression operations.
  • Project presentation.docx - Contains all the analysis which is presented after completing the analysis.
  • Project report.pdf - Contains whole analysis strategy and analysis methodology followed for the project.

Source Directories:

  • NYC Taxi Data.csv - Includes all the required data for the Regression task.

-----------------------------------------------------

📖 XGBOOST (Ensemble Model)

new york city taxi trip duration

Before beginning with mathematics about Gradient Boosting, Here’s a simple example of a CART that classifies whether someone will like a hypothetical computer game X. The example of tree is below:

Formula 2

where, K is the number of trees, f is the functional space of F, F is the set of possible CARTs. The objective function for the above model is given by:

Formula 3

where, first term is the loss function and the second is the regularization parameter. Now, Instead of learning the tree all at once which makes the optimization harder, we apply the additive stretegy, minimize the loss what we have learned and add a new tree which can be summarised below:

Formula 3

XGBoost minimizes a regularized (L1 and L2) objective function that combines a convex loss function (based on the difference between the predicted and target outputs) and a penalty term for model complexity (in other words, the regression tree functions).

📋 execution instruction.

The order of execution of the program files is as follows:

1) Health_Insurance_Cross_Sell_Prediction.ipynb

The Health_Insurance_Cross_Sell_Prediction.ipynb is to be executed to access all the analysis done for classification operations.

📜 Conclusions

1) Regression analysis was conducted to create a system that can assist taxi companies in determining the duration of cab trips, thereby improving their business models and enhancing customer availability.

2) The problem statement was resolved by using various regression techniques, including linear regression, decision trees and XGBOOST regressions.

3) Preparing the data for analysis and loading it for machine learning models involved PCA transformation, feature engineering, and forward Feature selection.

4) Various regression models were processed and XGBOOST performed better than other models with high R2 score and low RMSE score with 97% accuracy.

📜 Future work

- As this data set is of only almost 6 months and I think there should be more data for more than a year and also some more features should be there so that we can train our models with more significant information that will help our model to learn more efficiently so that we can get more higher performance from Machine Learning Models.

- And also we can extract more information about this data by getting more features so that we can explore more about this kind of data

Ankush Kumar | Data Science | Machine Learning Engineer | Deep Learning enthusiast

Contact me for Data Science Project Collaborations and Data Science related job roles

LinkedIn Badge

📚 References

XGBoost has helped to understand more about XGBOOST

Available: https://xgboost.readthedocs.io/en/stable/

Geeksforgeek helped to understand the working of xgboost more effciently and easily.

Available: https://www.geeksforgeeks.org/xgboost/

Medium.com, 'NYC Taxi Trip Duration Prediction using Machine Learning'. [Online].

Available: https://medium.com/@ShortHills_Tech/nyc-taxi-trip-duration-prediction-using-machine-learning-a92874bd761

Youtube.com, 'NYC taxi trip duration'. [Online].

Available: https://www.youtube.com/watch?v=p1OnQfFfJeU

Reseachgate, 'NYC Taxi Trip and Fare Data Analytics using BigData'. [Online].

Available: https://www.researchgate.net/publication/287205557_NYC_Taxi_Trip_and_Fare_Data_Analytics_using_BigData

lexisnexis.machinelearnigmastry, 'Regression Metrics for Machine Learning'. [Online].

Available: https://machinelearningmastery.com/regression-metrics-for-machine-learning/

Thank you so much for visiting 😄

  • Jupyter Notebook 100.0%

Cab Etiquette In NYC: All You Need to Know

We’ve all been there. You stayed out a little later than you planned, and you’re a little worse for wear. You need to go to bed, but the city’s unfamiliar to you. The public transport maps might as well be Jackson Pollock paintings. So you do what every single person does in films and TV shows based in New York. You raise your hand, and within seconds a yellow cab’s pulled up beside you. Hopefully you’re on your way in seconds and home safe and sound , but if anything seems off or you need help and advice, read on. Here’s what you need to know about cab etiquette in NYC.

new york city taxi trip duration

Can a cab driver ever refuse me service?

Yes, but only if the trip is more than 12 hours long, or if their ‘taxi’ light is off. 12 hour+ journeys are against the law in the US, and only taxis with their lights on are currently working. If you’re staying far out of the city centre, perhaps get in the cab before telling them where you’re going. It might seem sneaky, but once you’re in their cab they are legally obligated to take you to your destination. Crazy, right?

My taxi is loud and uncomfortable. What can I do?

A lot, thankfully. Riders have rights too, after all. If your driver is on a call or using their phone, they’re being super illegal. Feel free to remind them. If the cab is too hot or cold, depending on the time of year, you can also request they put the air con/heating on. And if their music is too loud, by all means, politely ask them to turn it down or off. Just don’t berate their choice of genre.

However, if the driver refuses these, or any reasonable requests, you have the right to get out at any time. And remember to take down their medallion number if you want to make a complaint. It’s on their licence plate, the hood of the taxi, and on your receipt if you request one.

new york city taxi trip duration

What if I’m being loud, and making the driver uncomfortable?

Firstly, why...would you... do that? Secondly, while drivers have no legal grounds to ask you to keep it down, have some respect for them. And for yourself. Driving a taxi all day is exhausting , and navigating the hectic streets that never sleep requires concentration. Cab etiquette in NYC, or anywhere works both ways. Be respectful, and you’ll likely earn their respect. And a safer and quicker journey home, too.

Should I stare at them creepily through the rear-view mirror?

No. No, don’t. Why would you even...?

How much should I tip?

Tips are big business in New York, as they are in the rest of the US. But sadly you’ll be expected to pay over the odds in the Big Apple. 20% of the fee is the recommended amount. If you’re paying with card instead of cold hard cash, the amount of gratuity will automatically be added to the charge. It could go as high as 30%, so keep that in mind if you’re squeezing pennies. Of course, if you’re an out-of-towner and they’ve been helpful with info or recommendations, why not be a nice little human and show them your gratitude with money?

Tipping’s the best way to thank them, but if you want to go above and beyond because they did, hop on the nyc.gov website and leave a glowing review, you selfless beauty.

new york city taxi trip duration

If the driver asks for cash, is it OK to use my card instead?

Yes. Every taxi in NYC is required by law to take card, so if your driver says they don’t have a machine or that it’s broken, it’s a ruse. Persist, and victory will be yours. Drivers may also mention they’ve selected ‘Cash’ instead of ‘Card’ and that they can’t reverse the decision. This, too, is a ruse. Stay strong, and wait for the card machine. It’s simply a case of them pressing a single button to make it happen. Also get your receipt - it contains lots of vital information like their medallion number which you’ll need if you lose something in the cab, or want to make a complaint.

That’s what you need to know about taxi etiquette in NYC. We hope these tips help. Of course, we’re always open to suggestions, so if you have any other top tips you’d like to add, let us know in the comments below! Stay safe, travelers.

Has this cab etiquette in NYC blog satisfied your itch for all things New York? No? Still prefer public transportation? Sure thing, here's more about the metro system in NYC .

Continue reading

The rockefeller christmas tree lighting, diwali new york: a festival of light.

New York Skyline

Things to do in New York on Labor Day Weekend

Have a 5% discount, on us.

More savings? You're welcome. Sign up to our newsletter and receive exclusive discounts, vacation inspiration and much more.

  • Thick check Icon By signing up, you agree to receiving email updates in accordance with The New York Pass privacy policy . We do not sell your personal data.

new york city taxi trip duration

Abhishek Das

Lead Data Scientist at KPMG Digital Delta

Exploring NYC Taxi Data (Updated)

_config.yml

Scatterplot of all pickups and dropoffs in New York City

This post explores a subset of the NYC taxi dataset for the month of April 2013. I extract, transform and load the trip fare and trip details csv files into a sqlite database. I use this data to predict the fare and tip taxi drivers will receive. The repository containing my entire analysis is here and the presentation slides are available here as pptx and here as pdf.

The April 2013 taxi data is provided in two csv files: trip_details and trip_fares. I extract, transform and load these csv files into two separate tables: trips_table and fares_table in a SQLite database. I make subsequent calls to these tables in my EDA and modeling notebooks. The ETL process is detailed here .

Data Cleaning

Each table is cleaned for outliers including restricting latitude and longitude co-ordinates to lie in beween (40.67, -74.027) and (40.85, -73.85). I remove trips that last for 0 minutes and 0 miles. I also restrict the dataset to all trips made within New York City alone and it’s two closest airports: La Guardia and JFK International. This excludes trips made to Westchester and Nassau counties (Rate Code 4) as well as out of town trips (Rate Code 5) which are negotiated at a flat fee, where the odometer or trip time is not indicative of the distance or duration of the trip. While these actions may bias the results, 99.84% of all trips in April 2013 were made within the city of New York. A few thousand trips have their payment registered as ‘Disputed’, ‘No Charge’ or ‘Unknown’. These trips were excluded as overwhelmingly, passengers paid by credit card or cash. Finally, the NYC Taxi and Limousine company permits a maximum of 6 passengers in a 5 passenger taxicab, if the sixth passenger is a child under 7 who can sit on an adult’s lap. I screen out all trips with more than 6 passengers.

Data Merging

Using taxi medallion as unqiue taxicab identifier, hack license as unique driver identifer, vendor id and pickup_datetime as common keys, I merge the two tables above and return just over 14 million rows of data. Going forward, I use the assignment questions to propel my EDA of the dataset. The complete data munging process is described in this notebook .

Basic Questions

Complete notebook available here .

Q1. What is the distribution of the number of passengers in each cab?

Overwhelmingly taxi cabs are hailed by a sole passenger. More cabs are hailed by a single passenger than the total number of cabs hailed by two or more passengers as shown in Figure 1.

_config.yml

Q2. Do most customers pay with cash or card?

The results are pretty close, with nearly 54% of trips being paid for by credit card.

_config.yml

Q3.1 What does the distribution of fare amounts look like?

The initial charge on a cab is $2.50, so I confirm there are no fares below this amount. The most expensive fare was $204 while most fares were under $35.

_config.yml

Winsorizing the fare amount data by removing the top and bottom 1% shows the median fare amount is $9 as shown below. The bottom 5% of fare amounts is less than $5 while the top 95% of fares is higher than $24.

_config.yml

3.2 Is there a difference between airport and non-airport fare amounts?

Figure 5 shows the median and modal airport fare amount is $52. This is in contrast to non-aiport fares which tend to be under $35.

_config.yml

I looked into whether this specific fare amount was more or less likely to be paid by cash or credit card but the results were split with 56% of passengers choosing to pay with card and 44% choosing to pay with cash.

Q4.1 What does the distribution of tip amount look like?

When looking across all rides, most passengers don’t appear to tip well. While tip amounts via credit card can be verified, cash tips may be underreported by the taxi drivers themselves. Winsorizing the distribution of tip amounts, the modal tip amount is $0 while the median is $1.

_config.yml

Q4.2 Is there a difference in the distribution of airport versus non-airport tips?

However, passengers tend to be more generous when it comes to tipping the cabbies that take them to the airport. Even though there are fewer airport fares compared to non-airport fares, it is understandable that drivers would want to take more airport trips.

_config.yml

Q5.1 What does the distribution of total amount look like?

Given the relatively low tip amounts reported, the distribution of the total amount will be similar to the distribution of the fare amount. Winsorizing the total amount by removing the top and bottom 1% shows the median amount is just shy of $11.

_config.yml

Q5.2 What does the distribution of total amount look like?

Airport total amounts are higher than non-airport total amounts which makes sense as airport fares are higher than non-airport fares. The median/modal airport total amount is $57.83.

_config.yml

Q6. What are the top 5 busiest hours of the day?

Evenings after work or dinner appear to be the busiest which most cabs being hailed at 7pm. The heatmap below shows that Monday and Tuesday evenings between 6pm - 8pm followed by Friday and Saturday evenings, are when most taxi trips occur.

_config.yml

Q7. What are the top 10 busiest locations in the city?

I filter latitude and longitude down to 2 decimal places and then sort through the most popular pickup and dropoff locations. These are identical and are all located in Manhattan, over 12.36 million trips. Rounding down latitude and longitude will increase clustering of pickup and dropoff points.

_config.yml

Q8. Which trip has the highest standard deviation of travel times?

Each trip is uniquely defined by its pickup and dropoff co-ordinates. It is important to determine what minimum sample size of trips to use to calculate the standard deviation. If there is a unique trip for example, then we cannot calculate it’s standard deviation of travel times. What minimum sample size do we need to determine the standard deviation of trip times? If there is a trip that has occured twice, one trip being 5 minutes long and one trip being an hour long, this trip will have a very high standard deviation based on a relatively small sample.

I make the following assumptions:

  • Margin of Error = 5%
  • Confidence Interval = 95% which is a Z-Score of 1.96
  • Standard Deviation = 0.5 (expecting 50% standard deviation will ensure large enough sample size)

The required sample size = ((1.96 x 0.5)/0.05)^2 = 384.16 = 385 trips, which is the minimum threshold of trips applied. All routes with fewer than 385 trips over the month are excluded. This minimum exclusion is applied to answering all questions going forward.

_config.yml

Travel times for trips originating from La Guardia airport to New York’s boroughs have the largest variance. Apparently airport traffic IS a nightmare.

Q9. Which trip has the most consistent fares?

Using the same minimum sample size threshold, I now examine the top 5 fare amounts have the lowest standard deviation as these will be the most consistent fares. Figure 13 reveals that these are shorter non-Airport routes. Three of these trips begin at the same location in Manhattan. To get more color on the differences between trips it would be interesting to understand the time of day and day of the week the trips were occcuring on.

_config.yml

Open Questions

Q10. which trips can we confidently use means as measures of central tendency to estimate fares, time taken.

As mentioned in question 8 above, certain trips may only occur once or twice, making calculations of central tendency based on these trips biased and erroneous. If the same trip takes twice as long for one taxi driver as it does for another, and our population is two trips, this skews calculated means and variances.

So how many occurences of the same trip - identified as beginning and ending at the same geocodes - are required before measures of central tendency can be calculated with confidence? Among 14 million trips, should the threshold be 50 occurences of the same trip or 1000?

The required sample size = ((1.96 x 0.5)/0.05)^2 = 384.16 = 385 trips, which is the minimum threshold of trip occurence over the month to be comfortable calculating measures of central tendency to estimate fares. Most of the trips which originate and end in Manhattan, including trips to La Guardia or JFK airports cross this threshold easily.

Q11. Build a model of Taxi Fare and tip given pickup and dropoff location

Fare modeling notebook available here while the tip prediction notebook can be found here .

I examine the correlations between trip fare (and log fare) versus the features in my database. It stands to reason that several features will have a high correlation with the fare including how long the trip was and the distance covered.

_config.yml

The variables chosen to predict fare and percentage tip include:

  • Average Speed Each Hour
  • Trips per Hour
  • Pickup Longitude and Latitude
  • Dropoff Longitude and Latitude
  • Trip Distance
  • Pickup Hour
  • Dropoff Hour
  • Day of Week
  • Day of Month

As there are millions of data points I use as RANSAC Regression model using the 9 features above as it is robust to outliers in the y-axis. The model has an R-square of 80.7%. The OLS model has a slightly higher R-square of 82% which persists after running a five fold cross validation. I show a line of best fit among the fare data in Figure 15 below.

_config.yml

Note the scatter plot of predicted versus actual fares shows a cluster of fares at the $50 mark, which requires further investigation. Either these fares were rounded or mis-reported by the taxi drivers. It may seem obvious that taxi drivers may negotiate lower fares up to $50. What is more puzzling are fares where our linear model predicts a high fare, but the taxi driver only reports $50. Are drivers pocketing these large fares?

_config.yml

When it comes to predicting the percentage of the fare that will be left as a tip, I use a similar linear model. This time an OLS regression has an R-square of 1.2% when it comes to predicting how much of the fare will be a tip. The linear model does a very poor job of predicting the percentage tip amount. We fare slightly better using a Neural Network and a Random Forest Regressor.

Q12. How would a taxi owner maximize earnings in a day?

I distinguish between a taxi owner and a taxi driver as follows. The taxi driver is represented by the hack license and their average daily earnings in April 2013 was $259 a day. A taxi owner owns the medallion for a given taxi and can have several drivers drive their cab. The average daily earning by a medallion (taxicab) was $480 per day.

The average daily revenue for taxi drivers is $480 per day. The constraining factor here is driving hours per day. I consider two approaches: looking at routes that generate the highest daily revenue and the routes that earn the highest revenue per hour. This way a taxi owner can either concentrate on areas that generate the highest revenue or lease out a medallion taxi to two drivers driving 12 hour shifts to maximize daily revenue.

Figure 17 plots all routes that generate that highest daily revenue and the total number of trips required to generate this revenue. Note all these trips are all based in Manhattan, except for one which is from Manhattan to La Guardia Airport.

_config.yml

Note from question 8, the trip to LaGuardia airport has the highest standard deviation of travel times. As total amount charged will vary will the time of the trip, consistently relying on this trip for maximizing revenue may not be the best solution. A taxi owner could prioritize taxi bookings for Manhattan trips and potentially take on other trips if they crossed the average daily revenue of taxi drivers ($480).

Another way to consider this problem is that we are trying to maximize revenue in the available time. I build a feature which is the ratio of total_amount/time driven in hours and see which routes maximize the earnings per hour driven. These routes can be maximized upon. Note assumptions here are that regardless of time of day these are the most profitable routes per hour.

_config.yml

Interestingly, these trips begin and end at the same geocode (rounded to 2 d.p.) and there appear to be intra-Manhattan geocode or intra-Airport geocode trips. These shorter trips can be used to maximize daily earnings. The results when trying to maximize the total amount earned per mile driven are identical.

It may be unrealistic to expect taxi drivers to keep driving back and forth between the same set of streets all day. In this case it may bear looking at less crowded routes as discussed in Q14 below.

Q13. How would a taxi owner minimize work time while retaining average wages earned by a typical taxi in the dataset?

As mentioned above, the average daily earning of a taxicab is $480. Looking at Figure 11 again, note that demand for taxicabs highest on Monday and Tuesday evenings between 6pm - 8pm or Monday and Tuesday mornings between 8 and 10 am. This is the morning and evening office traffic.

A taxi owner looking to minimize their driver’s work time should ensure drivers are working the morning rush and evening shifts from 6pm onwards. By contrast demand for taxis is much lower between 3am - 5am Monday to Thursday, so these hours can be skipped over.

By far the evening route that generates the most revenue per hour begins and ends from geocode (40.77, -73.86) and is an airport to airport transfer route. These are followed by several routes within Manhattan and are shown in Figure 16 below.

_config.yml

These evening routes generate the highest total fare amount per hour of driving. Taxis can focus on these locations until they hit their daily goal of $480 (or $259 per driver). After clearing their goal they can move on to more varied fares. Alternatively, if a taxi driver started off at a location other than the ones highlighted in Figure 16, they can drive to these routes to make up their daily fares.

Q14. How would a taxi company with 10 taxis, maximize earnings?

Assume each taxi can be driven all day by 2 drivers working 12 hour shifts without wear and tear. This translates to 20 shifts per day for the taxi company. I would ensure taxis are available at the most popular pickup and dropoff locations and for trips with the most consistent fares. However you wouldn’t want taxis working for the same company to undercut each other for the same fare.

My analysis so far reveals several insights for a smaller taxi company:

a. Instruct taxi drivers to focus on the routes with the highest earnings per hour (or earnings per mile). This would keep taxis working within smaller areas (zipcodes) and would allow the company to keep a fleet of cars working airport shifts and another fleet working Manhattan island shifts. The difficulty here is whether taxis could legally deny providing service to passengers who want to travel out of these zones

b. Once a taxi driver has earned half the average daily wage of taxi driver ($240) during their shift, give them the option to engage out of town fares or those fares whose trip times have higher standard deviation e.g. Manhattan to LaGuardia fares or possibly out of town fares where the total amount earned may be higher.

c. The worst time to have a taxi out for service are weeknights or Friday and Saturday nights as these tend to be the busiest times. Get taxis serviced during the day.

d. The most popular routes may be overcrowded, so it may be worth focusing on trips that generate the highest total revenue with the smallest number of individual trips, as shown below.

_config.yml

Further Questions

It would be interesting to see the impact of services such as Uber or Lyft on taxi demand over time. Also of interest would be the impact of precipation or temperature on a particular day on taxi demand, fare and tip. Finally, information on traffic congestion and road conditions would be invaluable to getting more insight from this dataset.

Browse Econ Literature

  • Working papers
  • Software components
  • Book chapters
  • JEL classification

More features

  • Subscribe to new research

RePEc Biblio

Author registration.

  • Economics Virtual Seminar Calendar NEW!

IDEAS home

New York City taxi trip duration prediction using MLP and XGBoost

  • Author & abstract
  • 2 Citations
  • Related works & more

Corrections

(Hamad Bin Khalifa University, Qatar Foundation)

(Department of CTO 5G, Wipro Limited)

(CFO Technology, Enterprise Risk Function Technology, Bank of America)

(University of South Wales)

(Prince Sattam bin Abdulaziz University)

(King Abdulaziz University)

Suggested Citation

Download full text from publisher.

Follow serials, authors, keywords & more

Public profiles for Economics researchers

Various research rankings in Economics

RePEc Genealogy

Who was a student of whom, using RePEc

Curated articles & papers on economics topics

Upload your paper to be listed on RePEc and IDEAS

New papers by email

Subscribe to new additions to RePEc

EconAcademics

Blog aggregator for economics research

Cases of plagiarism in Economics

About RePEc

Initiative for open bibliographies in Economics

News about RePEc

Questions about IDEAS and RePEc

RePEc volunteers

Participating archives

Publishers indexing in RePEc

Privacy statement

Found an error or omission?

Opportunities to help RePEc

Get papers listed

Have your research listed on RePEc

Open a RePEc archive

Have your institution's/publisher's output listed on RePEc

Get RePEc data

Use data assembled by RePEc

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Springer Nature - PMC COVID-19 Collection

Logo of phenaturepg

New York City taxi trip duration prediction using MLP and XGBoost

1 College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, Doha, Qatar

Mohit Malviya

2 Department of CTO 5G, Wipro Limited, Bengaluru, India

Chahat Kumar

3 CFO Technology, Enterprise Risk Function Technology, Bank of America, Chennai, India

Mounir Hamdi

V vijayakumar.

4 University of South Wales, Sydney, Australia

Jamel Nebhen

5 College of Computer Engineering and Sciences, Prince Sattam bin Abdulaziz University, P.O. Box 151, Alkharj, 11942 Saudi Arabia

Hasan Alyamani

6 Department of Information Systems, Faculty of Computing and Information Technology in Rabigh, King Abdulaziz University, Rabigh, 21911 Saudi Arabia

New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to travel from one place to another. Given the rising popularity of app-based taxi usage through common vendors like Ola and Uber, competitive pricing has to be offered to ensure users choose them. Prediction of duration and price of trips can help users to plan their trips properly, thus keeping potential margins for traffic congestions. It can also help drivers to determine the correct route which in-turn will take lesser time as accordingly. Moreover, the transparency about pricing and trip duration will help to attract users at times when popular taxi app-based vendor services apply surge fares. Thus in this research study, we used real-time data which customers would provide at the start of a ride, or while booking a ride to predict the duration and fare. This data includes pickup and drop-off point coordinates, the distance of the trip, start time, number of passengers, and a rate code belonging to the different classes of cabs available such that the rate applied is based on a regular or airport basis. Hereafter, we applied XGBoost and Multi-Layer Perceptron models to find out which one of them provides better accuracy and relationships between real-time variables. At last, a comparison of the two mentioned algorithms facilitates us to decide that XGBoost is more fitter and efficient than Multi-Layer Perceptron for taxi trip duration-based predictions.

Introduction

Earth is filled with an enormous population that tends to move from one place to another. Advancement in technologies had led to different ways of transportation. These include buses, autos and especially taxi services. New York City is one of the highly advanced cities of the world with extensive use of taxi services. Along with a vast population, the requirement of commonly available transportation serves the common purpose as it provides a very large transportation system. New York facilitates one of the largest subway systems in the world and comprises various green and yellow cabs which approximately count of around 13,000 taxis. Most of the population of New York depends upon public transport, and it has been estimated that 54 percent of the people do not own a car or a personal vehicle. As a matter of fact, it accounts for almost 200 million taxi trips per year.

The dataset we have used is available at Kaggle live, and its related information was collected over the years along with certain dependencies and provided to the public for further analysis. We used a collection of these datasets, which depicted around 3 years of NYC taxi trip data - about 15 lakhs records were considered, which carried the information of Taxi trip from January 2017 to January 2020.

Considering various Machine Learning models that provide reliable and improved accuracy for prediction-based use-cases, XGBoost and MLP are taken into consideration due to their novel potentiality to accumulate complex component conditions. Successful prediction of the taxi trip duration would eventually be much useful in the future to make better taxi trip duration predictions applicable to multiple cities.

XGBoost is short for “Extreme Gradient Boosting” which comes in association with various ensemble learning algorithms. It represents a flexible type of implementation where-in the concepts of decision trees (Gupta et al. 2020 ) get wholly acknowledged. Moreover, it is found to be much faster when compared to more common algorithms like Adaboost.

Further, it has recently dominated the machine learning world and gotten much attention in Kaggle competitions. Execution speed and Performance (Qureshi et al. 2020 ) are the two essential factors of using this algorithm in our work.

Multilayer Perceptron (MLP)

A Perceptron is considered as a linear classifier that produces a single output based on several linear functions. A multilayer perceptron (MLP) is a class of feedforward artificial neural network Sharma et al. ( 2020 ) which forms the basis for a deep learning platform. It encapsulates a deep artificial neural network that comprises more than one perceptron. This Artificial Neural Network mainly consists of nodes that use a non-linear activation function. It uses a backpropagation algorithm which gets classified under Supervised learning (Butgereit and Martinus 2019 ) methods. Non-linear data can also be separated using the multiple layers and the non-linear activation of the MLP, which makes it quite distinguishable (Kabán 2019 ) from a linear perceptron.

Thus, the contribution of this paper is as follows:

  • Since the duration of the taxi trip is highly dependent on the time at which the trip is made, the prediction becomes highly complex. In this regard, we have taken into account the time of the trip for reliable predictions. Also, we have excluded co-ordinates of locations present outside New York City because of their outlying nature. Using XGBoost equipped with K-Means clustering and given specific location, date, and time variables, we then analyzed and estimated the ride duration using real-time data which gets collected from various taxis.
  • The Multi-layer Perceptron model is used further to determine the relationship between various real-time labels and identities as taken from the data associated with different taxis.
  • Comparison between XGBoost and Multi-Layer Perceptron models is later done to determine which one of them comes off suitable and reliable for the mentioned New York City Taxi Prediction.

The remainder of the paper is organized as follows. In Sect.  2 , we discussed the Related Work. New York City Taxi Duration Dataset Description is discussed in Sect.  3 followed by Methodology in Sect.  4 . Simulation results and performance evaluation are provided in Sect.  5 before we summarize our concluding remarks in Sect.  6 .

Related work

We studied a variety of different research works in the topics of Neural Networks, Multi-layer Perceptron, Bagging and Boosting, and other ML algorithms like AdaBoost and XGBoost for prediction-based methods. We tried to understand the methodology and workflow of each algorithm and how it would be beneficial to our project. The analysis of the research papers helped us to gain a number of possible insights, advantages, and disadvantages of the algorithms which could potentially provide the best solution for our problem statement. Based on the analysis, we reached a conclusion on how to work on the mentioned New York City Taxi Prediction use-case.

We started with (Ran et al. 2020 ) where speed and traffic stream were taken into account as the contribution to the model. The maximum places acquired by the K-means++ model and calibrations acquired by the XGBoost model are utilized to find out the Euclidean distance(ED). The base estimation of the calculated values gets utilized as the prescient estimation of the congestion level caused by different vehicles. As indicated by the forecast trial of I15-N interstate traffic information in PeMS information base, the joined model outstripped different models and the prescient exactness of the consolidated model came up at 94.47%. Further, (Liao et al. 2019 ) was considered where-in a load anticipating procedure dependent on XGBoost along with comparative days was proposed. This mechanism was used to break down the basic meteorological laws and everyday types based on the heap load. The XGBoost algorithm with the loss function and Taylor extension were added to the different quantitative terms to control the unnecessary fitting and intricacy. The charge-based and temperature-based information in a specific territory was completely taken as different sets of the test. The conclusions provided that the proposed XGBoost model can anticipate the heap-based load quite adequately.

To add more, (Wang et al. 2020 ) presented a driving conduct wellbeing assessment SVM-based mechanism which separated out the different values of distributive features to get the ideal order of hyperplane and afterward utilized the mathematical stretch as the assessment list for driving conduct wellbeing. Simultaneously, driving conduct, crowd-based streets, proficiency, sparing of energy, and climate factors with various other loads were considered, and thereafter it partitioned various driving conduct in four types: Good, Normal, Above the threshold, and Unfit level in view of SVM and KMeans. Subsequently, the XGBoost inherent mechanism was utilized. The test inferred the normal precision of 99.21% and the normal review rate of 98.5% which eventually demonstrated the whole operation was truly viable and attainable. To comprehend the innovations in XGBoost technology, (Cao et al. 2020 ) threw light upon a momentary traffic stream forecast model. This technique was dependent on best and worst inclination rise such that the analysis results uncovered the predominance of the whole system by contrasting it with the previous anticipation model.

Moreover, (Yang et al. 2020 ) was put into consideration as it reflected LC choice procedure that enabled vehicles based on autonomous ability to settle on human-like choices. This technique joined the XGBoost algorithm alongside a profound autoencoder (DAE) network-based technology. Initially, an autoencoder gets used to assemble a strong multi-component reformation structure utilizing time arrangement information from a different category of sensors. Thereafter, the recreated log errors pertaining to the DAE get prepared with other primary and secondary information, and as such, the whole process gets examined for LCID. Thus, the preparation of information extraction was made accordingly and at this point, to address the non-symmetric and multifactorial issue of the LC dynamic cycle, a Bayesian boundary enhancement with an XGBoost calculation came into the effect. In the interim, to completely prepare the learning model with a huge scope of data information sets, a proposal of a web-based preparing methodology was furnished to refresh the model boundaries with information clusters. The exploratory outcomes delineated that the given model can precisely distinguish the LC conduct of vehicles. Moreover, when information of similar parameters was added, the whole structure accomplished preferable execution over other mainstream techniques.

In order to understand the holistic environment of XGBoost, (Montiel et al. 2020 ) was taken which introduced a transformation of XGB for characterization of developing information-based varied streams. Here, new information gets shown up over the long haul and the connection between the highlights and varied-classes was getting changed simultaneously. This technique made new individuals of the entity based on ensembling as new entry points which later gets opened up as set by the required changes. The greatest group size was allowed to be fixed, yet the process of learning various features didn’t get stopped in light of the fact that the model was refreshed on new information to guarantee uniformity with the latest ideas. Likewise, an investigation of the utilization of drift concept identification was done to activate a component so as to refresh the group. Testing of the technique on manufactured information with drift identification was made available and later, it was differentiated against other methods of classification for information streams. The results proved out to have a powerful impact produced by the proposed idea over other previous methodology used.

To familiar with the knowledge of Multi-Layer perceptron, (Ayyappa et al. 2020 ) was used where-in a computerized Tumor recognition procedure was proposed which helped various doctors in recognizing cerebrum tumors. Here, a solidarity MLP based Gaussian Filtering alongside BP Neural Network was evaluated which delivered good precise outcomes while distinguishing the cerebrum tumor with an exactness pace of 93% when contrasted with different classification methods like SVM and PNN. Likewise, (Sunindyo and Satria 2020 ) investigated out the likelihood to utilize the CCTV film so as to perform anticipation based on regular traffic-data. The recording was prepared consequently utilizing detection and tracking of the object-based procedures to get adequate traffic information points. From that point forward, the information dedicated to traffic entry points (Suresh et al. 2021 ) was demonstrated by encompassing both LSTM and MLP. The efficiency of the whole structure was estimated by utilizing RMSE which in-turn provided accurate high-level information from the given data. This investigation demonstrated that prepared CCTV film is in fact a practical alternative for gridlock expectation. The best model accomplished 1.88 RMSE by measures of vehicles, transports, and trucks as an anticipated variable with a fortified MLP strategy.

To enhance the idea of MLP, (Khamees et al. 2020 ) was utilized to find out another methodology for preparing the MLP in light of the crow-search streamlining mechanism. The primary target of this methodology was to diminish varied shortcomings to its base level and increment the pace of the classifying process. The marked threshold of the given execution was accomplished by fabricating distinct typical datasets for the process of classification. As such, it was also done to guarantee that the nature of the outcome remains high, and additionally, this mechanism was later contrasted with other classification algorithms, for example, ACO, GA and PSO. The results showed up that the search based on crow streamline calculation was most accurate as it delivered the most elevated precision rate and tackled the improvement and optimization issue effectively.

Hereafter, (Wu et al. 2019 ) was availed to acknowledge another compounded variable choice mechanism for non-symmetric MLP process. The provided operation used some garrote-based conceptualization on non-negative numerical values to pack the different weights pertaining to the MLP structure. Weights that provided zero subordinate factors as input were taken out from the underlying information. Then, a factor determination was done by using optimization calculation which got carried out on extremal parameters. The new factor choice calculation was then coordinated out which combined a great determination capacity dedicated to NNG and the exact nearby capacity of EO. Lastly, two instances of informational collections and a modern debutanizer application were actualized to show the efficiency of the new structure. The outcome exhibited that the created approach presented a much greater execution alongside the variable which provided fewer input data than the other variable decision strategies.

While the prediction-based algorithms becomes certainly important, Irio et al. ( 2021 ) suggested a model which transformed the directions information of the vehicles dependent on successive areas related to GPS and built an ethical-measurable surmising algorithm which in-turn was utilized in accordance with the portability expectation at an online level. Here, the surmising algorithm was dependent on Markov based secret model (HMM) such that every direction got demonstrated in terms of subset based on discrete/continuous areas. Besides, the forecast model utilized various measurable data construed up until this point and subordinated extensively on the calculation of the Viterbi mechanism that recognized the provided multiple subsets rooted on discrete/continuous areas. Along with it, the most extreme probability of numerous earlier subsets-based areas was supported to establish valuable prediction means. Additionally, a hybrid deep neural network prediction model was proposed by (Duan et al. 2019 ) which majorly proceeded on the idea of convolutional LSTM (ConvLSTM) techniques. Moreover, multiple certain connections between OD’s stream and movement’s time were investigated which later was joined for the contributions of the forecast algorithm. It also presented a lattice and street-settled technique to address ODs streams forecast around numerable street-based network degrees and tackled different issues that can’t help in recognizing stream-related traffics by using grid-based representation at various statures.

In addition to above, (Zhang et al. 2020 ) exhibited a learning-model based on various parallel tasks such that it contained three equalized-parallel layers of LSTM for co-foreseeing pickup and drop-off taxi requests. It also helped in contrasting multiple exhibitions of expectation procedures related to single interest and co-forecast strategy requests associated with two interest-based parameters. Exploratory outcomes on provided datasets showed the imperative and extensive dependence of pickup and drop-off requests upon one another which in-turn delivered solidarity governing adequacy based on the suggested co-forecasting strategies. Furthermore, (Kankanamge et al. 2019 ) utilized the sophisticated idea of gathering several taxi time-based travel directions connected with static parameters. It then involved isolated-based XGBoost models with respect to regression conditions alongside the above-mentioned data. Here, a bunch of extraordinary molded excursions and distinguished inlier were discretely differentiated with the use of prevailing leading algorithms. This permitted to furnish of the impressive prediction techniques of the XGB-IN prototype such that it produced less root mean squared error and mean absolute error in accordance with the real-world time travel figures. Further, it also facilitated to provide models based on XGB-Extreme mechanisms which gave sensibly precise expectation outcomes to a bundle of maximal-configured journeys accompanied by limited real-time taxi rides.

Consequently, (Maddikunta et al. 2020 ) investigated a robust ML linked random forest regression model towards the prediction of IoT gadgets-based battery life. As accordingly, a few techniques related to the data pre-processing like dimensionality reduction, normalization, and transformation were utilized for the model which in-turn attained a predictive exactness of about 97% across all the various scenarios. It was also demonstrated that the evaluated model gained better performance in sustaining the battery life of IoT gadgets as compared to existing state-of-art regression-based algorithms.

A better understanding of the methodology useful for the prediction can be provided using (Poongodi et al. 2020a ) where it employed maximum likelihood estimation to formulate the probabilities using the Logistic Regression Model. Here, an iterative-based regression algorithm was set to take place on all of the classes such that at least each of them was counted for various prediction structures. Later, (Poongodi et al. 2020b ) was studied which encompassed a Decentralized Autonomous Organization (DAO) to create a wholly sustainable and tidy community predictive development throughout the real-time world settings. Accompanying the use of the ML algorithms, (Poongodi et al. 2020c ) enhanced and improved the predictive monetary situation of all individuals connected officially with the different clusters of establishments and businesses by utilizing a model in-together which included various ML algorithms such as Hierarchical clustering, Decision tree, KNN clustering, etc. Extending different ideas, [24-25] reused or retransformed Linear SVM technology by using the prediction of any two given observations rather than the observations themselves. This accompanied to provide better and superior results for their researched use-cases. A predictive-based recommendation system was used in (Poongodi et al. 2019 ) where-in complex and normalized XGBoost Algorithms were used for the user credibility parameters. A number of factors based on the purchase and review history of the users were taken into consideration to develop a smooth and flexible prediction recommendation system.

In order to explore more about prediction operations, (Alazab et al. 2020 ) extended the use-case of the smart grid CPS mechanism by incorporating various schemas coupled with the Multidirectional Long Short-Term Memory (MLSTM) technique. This was done in order to allow the accurate prediction of the smart grid network stability matrices. Comparison between the existing best Deep Learning methods like RNN Guo et al. ( 2020 ), GRU, conventional LSTM, etc., and the suggested MLSTM procedure showed that the latter outflanks (Kashif et al. 2020 ) various other ML prediction-based models. At last, (Muhammad et al. 2021 ) was chosen which applied multiple supervised ML algorithms like SVM, naive Bayes, CNN, RNN, logistic regression, decision tree, etc. on epidemiology-based real-world labeled Coronavirus dataset so as to detect COVID-19 disease. A major part of the procedure was carried out to clean the data which benefited to find out strong correlations between independent and dependent features of the chosen dataset. Based on the critical analysis of various ML approaches, it was found that the decision tree model accomplished the best accuracy of 94.99% in comparison to other techniques.

Thus after careful analysis, we discovered several miscellaneous and mixed drawbacks in the variegated models that were hybridly used in the prediction mechanisms. Supervised Machine Learning models such as Decision tree and random forest classification/regression were found to be superior to others in terms of their sensitivity, specificity, and accuracy due to which the idea of using XGBoost is taken further for the New York City Taxi Prediction use-case. Moreover, the presence of using K-means clustering with XGBoost Model (Tang et al. 2020 ) over the rest of the Unsupervised ML techniques was noticed because of its convergeable, scalable and adaptable properties. Subsequently, the employment of the Multi-Layer Perceptron is involved in the second part of this research paper since it turned out to provide higher heteroskedasticity and an added advantage of solving complex and non-linear problems. Following the standards of the neural network, MLP based models aid to deduce hidden interconnections within the real-time multiplex datasets (Tang et al. 2020 ) which eventually supports in making out efficient and improved methods (Chinmay and Rodrigues Joel 2020 ) for the mentioned taxi prediction application.

Dataset description

New York City Taxi Duration dataset is taken from the Kaggle website which provides free access to complex challenges. This dataset helps us to predict the trip duration of a taxi ride taking into account the different factors that affect the ride duration. Along with the above-mentioned, one more dataset gets included which involves the climatic conditions of the city. Both of these datasets are combined using pre-processing techniques to create a single dataset that can be used further for accurate trip duration prediction. Some of the important attributes of the dataset are discussed below:

  • id , which provides a unique identification to a trip.
  • vendor id , a unique code which gets assigned to the different cab companies.
  • pickup datetime , starting statistics of the pickup.
  • dropoff datetime , ending statistics of the pickup.
  • passenger count , passengers travelling in a particular trip.
  • pickup longitude , longitudinal location of the pickup.
  • pickup latitude , latitudinal location of the pickup.
  • dropoff longitude , longitudinal location of the drop off.
  • dropoff latitude , latitudinal location of the drop off.
  • store and fwd flag , a code to identify whether the data is stored on the device and then gets forwarded to the database.
  • trip duration , the total time of the trip in seconds.

The second dataset comprises the climatic data of the city which includes vital information such as the time of rainfall, sunlight, and various other factors which can be used for better prediction of the taxi trip.

Proposed methodology

Our kernel is written and developed using iPython Notebook and XGBoost model with the assistance of a mini-batch K-means clustering algorithm. The workflow of the kernel includes the following steps:

  • First of all, importation of all the necessary libraries including Sklearn library is done.
  • Both the dataset gets imported accordingly in order to analyse the various attributes of the taxi trip duration.
  • Mathematical values such as standard mean, variance and quartiles of all the features is then find out to gather multiple parameters. While calculating various constraints, careful attempts to avoid any type of mismatch gets regularly checked as needed.
  • Thereafter, the Mini batch clustering gets utilised which provides highly susceptible to outliers. Cleaning of the data to remove the outliers is accordingly employed so that the above algorithm starts to work efficiently.
  • The cleaned data is then analysed deeply for more feature extraction by finding out the correlation in the data which ensures maximum coverage.
  • Computation of three different distances i.e. manhattan, haversine and bearing distance between the pickup and drop off location gets evaluated. Manhattan distance gives the straight line distance between the specified coordinates. But since the earth is round and taking into account the straight line distance is like neglecting an important aspect of the route, so as a result, Haversine Distance gets employed extensively. Moreover, Bearing direction is used to calculate the angular distance between various point of interest.
  • Hereafter, the average of all the three distances is calculated and added to the cleaned dataset as extracted features which in-turn gets further used for critical analysis.
  • Next, Mini-batch K-means algorithm is applied to cluster points on basis of the pick-up latitude, pick-up longitude, drop off latitude and drop off longitude variables. Later, the clusters obtained are used to find out their centres and subsequently, the trips are divided according to above-mentioned clusters parameters. Additionally, these area-based clusters are added as an extra feature to the dataset.
  • As a result, addition of about 200 features gets accomplished in the form of cluster centres. Here, the added features mainly includes 100 pick up and 100 drop off clusters points.
  • Finally, the redundant columns are removed and the associated back-bone of the kernel i.e. XGBoost model gets applied to the dataset with the added parameters. Henceforth, several results are observed for the taxi-based prediction values.

A similar methodology for multi-layer perceptron is also followed which includes importing libraries and datasets (incorporating external data for improving accuracy), pre-processing the imported datasets, and so on. Rectified Neural Networks are then applied to eliminate outliers appropriately. At last, the application of linear neural networks is performed to get the desired results.

Results and discussion

As shown in Fig.  1 , we plot a simple histogram of the trip duration by throwing the data into 100 bins. Binning involves taking the data’s maximum and minimum points, subtracting them to get the length, dividing the calculated length by the number of bins to get the interval length, and finally grouping the data points into mentioned intervals.

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig1_HTML.jpg

Number of training records vs Trip duration

Further, a Gaussian curve-based graph as shown in Fig.  2 gets plotted which aids to determine an insightful relationship between various taxi trips and the logarithm of trips duration. This also provides an intuitive pattern understanding of how taxi services work in New York City.

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig2_HTML.jpg

Logarithmic Trip duration

It is very important for us to find out whether the training and testing data are in agreement with each other or not. By this, we mean that we need to calculate the said parameters using a Time series graph that tell us how well are the number of trips over time-varying parameters in accordance with the training and testing dataset. As a result, we simply plot a time-series line graph of both the test and training data to not only look into identifying possible trends but to also see if both datasets follow the same pattern shape which is seen in Fig.  3 .

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig3_HTML.jpg

Comparison of Training and Testing Datasets

Next, we utilize the New York City map border coordinates in the kernel to create the canvas where-in the coordinate points get suitably graphed. Here, a simple scatter plot is precisely used to display the actual coordinates. It helps to show whether the pick-up points in the training and testing datasets overlap each other in some manner or not. This gets shown in Fig.  4 .

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig4_HTML.jpg

Comparison of pickup and dropoff points on the map of New York

After that, we plot three different graphs for average speed of a taxi based on different hours of a day, different days of a week and different months in the year. This is shown in Fig.  5 .

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig5_HTML.jpg

Average speeds

Finally, we visualise the feature importance graph as seen in Fig.  6 to see which features amongst all are most relevant and required for getting accurate results.

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig6_HTML.jpg

Feature Importance Graph

Successively, we run the XGBoost algorithm with the parameters shown below. Mentioned parameters can be changed as desired but before setting them out, one must study about XGBoost documentations as it greatly helps in understanding about how to fine-tune the parameters for better performance and efficiency. Accordingly, the features included are:

  • max depth = 6
  • learning rate = 0.09
  • iteration = 250

After running the algorithm, we get to infer that the average RMSE value over 250 iterations is about 0.39 for the training dataset and 0.44 for the testing dataset.

Similarly, we employ the Multi-Layer Perceptron model on a similar dataset. It essentially requires a deep learning setup using Rectifier to eliminate outliers from the data. As such, results are shown in Fig.  7 .

An external file that holds a picture, illustration, etc.
Object name is 13198_2021_1130_Fig7_HTML.jpg

Results of MLP Algorithm

The training accuracy of this algorithm is observed to be around 0.2740, while the testing accuracy sets out near 0.41. This precisely shows that XGBoost is slightly better than MLP model.

We are successfully able to implement both of the algorithms on the New York City Taxi Trip Duration dataset and able to draw certain conclusions from several inferences. After implementing both of the algorithms, we come across that XGBoost is better than MLP as it shows a slightly good accuracy than the latter one. This in turn helps to conclude that XGBoost Model is more efficient and reliable in predicting the taxi trip duration as compared to MLP.

As a part of the future work, the Multi-layer Perceptron model could be auto-tuned to further learn and determine which features need to get joined to detect numerous interactions between them as needed. Moreover, variabilities and quantities related to the various location features might also be computed in the upcoming research in order to localize the traffic-based effects on the taxi prediction coordinates. Speed limitations-based features could later be incorporated alongside to comprehend better analysis of the datasets. Further, New York Central Park and the associated weather conditions could also be closely taken care of as New Yorkers might take a taxi when they are near Central Park or when the weather condition is severe, but not when they are near Central Park and it is raining, since they may not visit the park in bad weather. At last, enhancements to the K-Means Clustering algorithm could be provided by encompassing additional features such as distance to the closest metro station, number of bars and eateries in a given zone, etc. so as to exploit comparative qualities belonging to various zones. This would also ensure the rightful evaluation of various clusters in which each data point falls such that it fills in as an extra vital element for our models.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

M Poongodi, Email: [email protected] .

Mohit Malviya, Email: [email protected] .

Chahat Kumar, Email: moc.liamg@7003ramuktahahc .

Mounir Hamdi, Email: aq.ude.ukbh@idmahm .

V Vijayakumar, Email: [email protected] .

Jamel Nebhen, Email: [email protected] .

Hasan Alyamani, Email: as.ude.uak@inamaylajh .

  • Alazab M, Khan S, Krishnan SSR, Pham Q, Reddy MPK, Gadekallu TR. A Multidirectional LSTM Model for Predicting the Stability of a Smart Grid. IEEE Access. 2020; 8 :85454–85463. doi: 10.1109/ACCESS.2020.2991067. [ CrossRef ] [ Google Scholar ]
  • Almathami Hassan Khader Y, Win Khin Than, Vlahu-Gjorgievska Elena. Barriers and facilitators that influence telemedicine-based, real-time, online consultation at patients’ homes: systematic literature review’ J Med Internet Res. 2020; 22 (2):16407. doi: 10.2196/16407. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Ayyappa Y, Bekkanti A, Krishna A, Neelakanteswara P, Basha C (2020) “Enhanced and Effective Computerized Multi Layered Perceptron based Back Propagation Brain Tumor Detection with Gaussian Filtering”, (2020) Second International Conference on Inventive Research in Computing Applications (ICIRCA). July, p, Coimbatore, India
  • Butgereit L, Martinus L (2019) “A Comparison of Four Open Source Multi-Layer Perceptrons for Neural Network Neophytes”, In: 2019 International Conference on Advances in Big Data, Computing and Data Communication Systems (icABCD). Winterton, South Africa
  • Cao J, Cen G, Cen Y, Ma W (2020) “Short-Term Highway Traffic Flow Forecasting Based on XGBoost”, In: 2020 15th International Conference on Computer Science & Education (ICCSE). Delft, Netherlands
  • Chinmay C, Rodrigues Joel JPC. A comprehensive review on device-to-device communication paradigm: trends, challenges and applications. Wireless Personal Commun. 2020; 114 (1):185–207. doi: 10.1007/s11277-020-07358-3. [ CrossRef ] [ Google Scholar ]
  • Duan Zongtao, Zhang Kai, Chen Zhe, Liu Zhiyuan, Tang Lei, Yang Yun, Ni Yuanyuan. Prediction of city-scale dynamic taxi origin-destination flows using a hybrid deep neural network combined with travel time. IEEE Access. 2019; 7 :127816–127832. doi: 10.1109/ACCESS.2019.2939902. [ CrossRef ] [ Google Scholar ]
  • Guo Z, Shen Y, Bashir AK, Imran M, Kumar N, Zhang D, Yu K (2020) Robust spammer detection using collaborative neural network in internet of thing applications. IEEE Internet of Things J 1–1. 10.1109/JIOT.2020.3003802
  • Gupta A, Sharma S, Goyal S, Rashid M (2020) “Novel XGBoost Tuned Machine Learning Model for Software Bug Prediction’, 2020 International Conference on Intelligent Engineering and Management (ICIEM). United Kingdom, London
  • Irio L, Ip A, Oliveira R, Luís M. An adaptive learning-based approach for vehicle mobility prediction. IEEE Access. 2021; 9 :13671–13682. doi: 10.1109/ACCESS.2021.3052071. [ CrossRef ] [ Google Scholar ]
  • Jeyachandran A, Poongodi M (2018) Securing Cloud information with the use of bastion algorithm to enhance confidentiality and protection. Int J Pure Appl Math 118(24)
  • Kabán Ata (2019) “Compressive Learning of Multi-layer Perceptrons: An Error Analysis”, In: 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary
  • Kankanamge KD, Witharanage YR, Withanage CS, Hansini M, Lakmal D, Thayasivam U (2019) “Taxi trip travel time prediction with isolated XGBoost Regression”, In: 2019 Moratuwa Engineering Research Conference (MERCon). Moratuwa, Sri Lanka, pp. 54–59
  • Kashif BA, Suleman K, Rabadevi B, Deepa N, Alnumay WS, Gadekallu TR, Maddikunta PKR (2020) “Comparative analysis of machine learning algorithms for prediction of smart grid stability”, Int Trans Electr Energy Syst, Feb
  • Khamees M, Ahmed WS, Abbas SQ (2020) “Train the Multi-Layer Perceptrons Based on Crow Search Algorithm”, In: 2020 1st. Information Technology To Enhance e-learning and Other Application (IT-ELA), Baghdad, Iraq, July
  • Koo J, Faseeh QNM, Siddiqui IF, Abbas A, Bashir AK. IoT-enabled directed acyclic graph in spark cluster. J Cloud Comput. 2020; 9 (1):1–5. doi: 10.1186/s13677-020-00195-6. [ CrossRef ] [ Google Scholar ]
  • Liao X, Cao N, Li M, Kang X (2019) “Research on Short-Term Load Forecasting Using XGBoost Based on Similar Days”, In: 2019 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS). Changsha, China
  • Maddikunta PKR, Srivastava G, Gadekallu TR, Deepa N, Boopathy P. Predictive model for battery life in IoT networks. IET Intel Transport Syst. 2020; 14 (11):1388–1395. doi: 10.1049/iet-its.2020.0009. [ CrossRef ] [ Google Scholar ]
  • Montiel J, Mitchell R, Frank E, Pfahringer B, Abdessalem T, Bifet A (2020) “Adaptive XGBoost for Evolving Data Streams”, In: 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, United Kingdom
  • Muhammad LJ, Algehyne EA, Usman SS, Ahmad A, Chinmay C, Mohammed IA (2021) Supervised machine learning models for prediction of COVID-19 infection using epidemiology dataset. SN Comput Sci [ PMC free article ] [ PubMed ]
  • Poongodi M, Ashutosh Sharma, Vijayakumar V, Vaibhav Bhardwaj, Parkash Sharma Abhinav, Razi Iqbal, Rajiv Kumar. Prediction of the price of Ethereum blockchain cryptocurrency in an industrial finance system. Comput Electr Eng. 2020; 81 :106527. doi: 10.1016/j.compeleceng.2019.106527. [ CrossRef ] [ Google Scholar ]
  • Poongodi M, Vijayakumar V, Chilamkurti N. Bitcoin price prediction using ARIMA model. Int J Int Technol Secured Trans. 2020; 10 (4):396–406. doi: 10.1504/IJITST.2020.108130. [ CrossRef ] [ Google Scholar ]
  • Poongodi M, Vijayakumar V, Rawal B, Bhardwaj V, Agarwal T, Jain A, Ramanathan L, Sriram VP. Recommendation model based on trust relations & user credibility. J Intell Fuzzy Syst. 2019; 36 (5):4057–4064. doi: 10.3233/JIFS-169966. [ CrossRef ] [ Google Scholar ]
  • Poongodi M, Hamdi M, Vijayakumar V, Rawal BS (2020b) and ”, 2020 IEEE 3rd 5G World Forum (5GWF). Bangalore, India, pp 1–6
  • Ran D, Jiaxin H, Yuzhe H (2020) “Application of a Combined Model based on K-means++ and XGBoost in Traffic Congestion Prediction”, In: 2020 5th International Conference on Smart Grid and Electrical Automation (ICSGEA). Zhangjiajie, China
  • Sharma R, Schommer C, Vivarelli N (2020) “Building up Explainability in Multi-layer Perceptrons for Credit Risk Modeling”, In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). Australia, Sydney
  • Sunindyo WD, Satria ASM (2020) “Traffic Congestion Prediction Using Multi-Layer Perceptrons And Long Short-Term Memory”, In: 2020 10th Electrical Power. Electronics, Communications, Controls and Informatics Seminar (EECCIS), Malang, Indonesia
  • Suresh P, Sundresan P, Mujahid T, Ganthan N, Chinmay C, Saju M, Zeeshan B, Mohammad TQ (2021) ANN base novel approach to detect node failure in wireless sensor network, CMC-Computers. Tech Science Press, Materials & Continua
  • Tang Q, Xia G, Zhang X, Long F (2020) “A Customer Churn Prediction Model Based on XGBoost and MLP”, In: 2020 International Conference on Computer Engineering and Application (ICCEA). Guangzhou, China
  • Wang X, Lou XY, Hu SY, He SC (2020) “Evaluation of Safe Driving Behavior ofTransport Vehicles Based on K-SVM-XGBoost”, In: 2020 3rd International Conference on Advanced Electronic Materials, Computers and Software Engineering (AEMCSE). Shenzhen, China
  • Wu X, Li Y, Wu H, Zhang F, Sun K (2019) “A hybrid variable selection algorithm for multi-layer perceptron with nonnegative garrote and extremal optimization”, In: 2019 19th International Conference on Control, Automation and Systems (ICCAS). Jeju, Korea (South)
  • Yang B, He Y, Liu H, Chen Y, Ji Z (2020) “A Lightweight Fault Localization Approach based on XGBoost”, 2020 IEEE 20th International Conference on Software Quality, Reliability and Security (QRS). Macau, China
  • Zhang C, Zhu F, Wang X, Sun L, Tang H, Lv Y (2020) Taxi demand prediction using parallel multi-task learning model. IEEE Trans Intell Trans Syst 1–10. 10.1109/TITS.2020.3015542

Predicting New York Taxi Trip Duration Based on Regression Analysis Using ML and Time Series Forecasting Using DL

  • Conference paper
  • First Online: 23 August 2022
  • Cite this conference paper

new york city taxi trip duration

  • S. Ramani 13 ,
  • Anish Ghiya 13 ,
  • Pusuluri Sidhartha Aravind 13 ,
  • Marimuthu Karuppiah 14 &
  • Danilo Pelusi 15  

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 458))

534 Accesses

The taxi fare and the duration of a trip are highly dependent on many factors such as traffic along route or late-night drives, which might be a little slower due to restricted night vision and many more. In this research work, it is attempted to visualize the various factors that might affect the trip durations such as day of the week, pickup location, drop-off location and time of pickup. The research work mainly analyses the dataset obtained from the NYC Taxi and Limousine Commission (TLC) which contains the data of taxi trips from January 2016 to June 2016 with GPS coordinates. The analysis of the data is performed, and the prediction of the taxi trip duration is done using multiple machine learning and deep learning models. The analysis is done for these models based on the mean squared error and the R2 score that is found without scaling and performing scaling on the data. The maximum \(R^2\) score was attained with the recurrent neural network (RNN) using time series analysis with a score of 0.99 and 0.97 with XGBRegressor , and an increment of 0.6% was observed with normalizing value using log transform while analysing it as a regression perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Available as EPUB and PDF
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

U. Patel, A. Chandan, NYC taxi trip and fare data analytics using BigData, in Analyzing Taxi Data Using Bigdata (2015)

Google Scholar  

S. Rong, Z. Bao-wen, The research of regression model in machine learning field, in MATEC Web of Conferences , vol. 176, pp. 01033. EDP Sciences (2018)

Z. Turóczy, L. Marian, Multiple regression analysis of performance indicators in the ceramic industry. Procedia Econ. Finan. 3 , 509–514 (2012)

Article   Google Scholar  

J.G. De Gooijer, R.J. Hyndman, 25 years of time series forecasting. Int. J. Forecast. 22 (3), 443–473 (2006)

P. Montero-Manso, G. Athanasopoulos, R.J. Hyndman, T.S. Talagala, FFORMA: feature-based forecast model averaging. Int. J. Forecast. 36 (1), 86–92 (2020)

S. Makridakis, E. Spiliotis, V. Assimakopoulos, Statistical and machine learning forecasting methods: concerns and ways forward. PloS One 13 (3), e0194889 (2018)

R. Madan, P.S. Mangipudi, Predicting computer network traffic: a time series forecasting approach using DWT, ARIMA and RNN. in 2018 Eleventh International Conference on Contemporary Computing (IC3) , pp. 1–5. IEEE (2018)

S. Nihale, S. Sharma, L. Parashar, U. Singh, Network traffic prediction using long short-term memory, in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) , pp. 338–343. IEEE (2020)

T. Shelatkar, S. Tondale, S. Yadav, S. Ahir, Web traffic time series forecasting using ARIMA and LSTM RNN, in ITM Web of Conferences , vol. 32, pp. 03017. EDP Sciences (2020)

J. Sola, J. Sevilla, Importance of input data normalization for the application of neural networks to complex industrial problems. IEEE Trans. Nucl. Sci. 44 (3), 1464–1468 (1997)

F.E.N.G. Changyong, W.A.N.G. Hongyue, L.U. Naiji, C.H.E.N. Tian, H.E. Hua, L.U. Ying, Log-transformation and its implications for data analysis. Shanghai Archiv. Psychiat. 26 (2), 105 (2014)

S. Du, M. Pandey, C. Xing, Modeling Approaches for Time Series Forecasting and Anomaly Detection (ArXiv, Stanford, 2017)

M. Abdoos, A.L. Bazzan, Hierarchical traffic signal optimization using reinforcement learning and traffic prediction with long-short term memory. Expert Syst. Appl. 171 , 114580 (2021)

https://www.kaggle.com/c/nyc-taxi-trip-duration/data . Last Accessed 4 Oct 2021

https://www.kaggle.com/oscarleo/new-york-city-taxi-with-osrm . Last Accessed 4 Oct 2021

https://www.kaggle.com/mathijs/weather-data-in-new-york-city-2016 . Last Accessed 4 Oct 2021

Download references

Author information

Authors and affiliations.

School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, 632014, India

S. Ramani, Anish Ghiya & Pusuluri Sidhartha Aravind

Department of Computer Science and Engineering, SRM Institute of Science and Technology, Delhi-NCR Campus, Ghaziabad, Uttar Pradesh, 201204, India

Marimuthu Karuppiah

Faculty of Communications Sciences, University of Teramo, Teramo, Italy

Danilo Pelusi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Danilo Pelusi .

Editor information

Editors and affiliations.

Gnanmani College of Engineering and Technology, Namakkal, India

Jennifer S. Raj

Department of Computer Science, Kennesaw State University, Kennesaw, GA, USA

Faculty of Communication Sciences, University of Teramo, Teramo, Italy

Automatics and Applied Software, Aurel Vlaicu University of Arad, Arad, Romania

Valentina Emilia Balas

Rights and permissions

Reprints and permissions

Copyright information

Š 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper.

Ramani, S., Ghiya, A., Aravind, P.S., Karuppiah, M., Pelusi, D. (2022). Predicting New York Taxi Trip Duration Based on Regression Analysis Using ML and Time Series Forecasting Using DL. In: Raj, J.S., Shi, Y., Pelusi, D., Balas, V.E. (eds) Intelligent Sustainable Systems. Lecture Notes in Networks and Systems, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-19-2894-9_2

Download citation

DOI : https://doi.org/10.1007/978-981-19-2894-9_2

Published : 23 August 2022

Publisher Name : Springer, Singapore

Print ISBN : 978-981-19-2893-2

Online ISBN : 978-981-19-2894-9

eBook Packages : Intelligent Technologies and Robotics Intelligent Technologies and Robotics (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

TRIP DURATION PREDICTION: NEW YORK TAXI RIDES USING XGBoost (NYC Taxi Trip Duration Dataset

Profile image of Shivam Attree

— Predicting a trip duration isn't something that has not been though upon. With the use of Google maps API one can find the estimated time it would take to move between two points in the city. However, a detailed analysis of the factors affecting a trip between two points in a city can be very useful for accurate and robust prediction. Trip duration is not as simple as it seems. It is data dependent and is governed by a lot many factors apart from distance and speed. This research primarily focuses on the possible important factors that are used as attributes for the trip duration prediction in the New York City. This data can be used by taxi vendors for better services to the users. The research work not only uses a prediction model but also gives an in-depth analysis of the factors associated with the New York City taxi trips. A city like New York is expected to have various factors and variations with respect to the trip durations. The dataset used for training and testing purposes in multi-dimensional and requires a lot of pre-processing. This research work involves application of relevant machine learning algorithms such as linear regression, random forests, lasso regression and XGBoost algorithms for completion of the task. The final algorithm used in this research work is XGBoost algorithm as it yields the best result when compared with other methods employed for the same task. A root mean square error of 0.4409 was achieved when the test data that consisted of about 600000 data points were given as an input to the training model.

Related Papers

Vol. 19 No. 2 FEBRUARY 2021 International Journal of Computer Science and Information Security (IJCSIS)

Journal of Computer Science IJCSIS

Travel time plays a crucial role in the intelligent transport system in metropolitan cities. Predicting accurate Taxi trip travel time helps commuters to plan their trip better and reach the destination on time. Most of the existing techniques use supervised learning models to estimate the travel time. Performance obtained from the supervised learning models is insufficient. In this paper, we propose a novel approach that aims at predicting travel time by using both supervised and unsupervised techniques with a large historic dataset, and this novel method is compared with supervised techniques. The clustering approach of un-supervised learning along with supervised helps to enhance the performance of a predictive model. Clustering helps in segmenting the nearby location data into a similar group which helps in finding the underlying pattern within the large dataset. Then, a supervised algorithm is applied to this clustered data. Machine Learning (ML) techniques such as Random Forest Regressor (RFR), XGBoost Regressor (XGBR), which are supervised and RFR with k-means, XGBR with k-means which combines both supervised and unsuper-vised techniques are used to predict the trip time of the taxi trips. The results show that a combination of supervised and unsupervised models perform better than only supervised models. Also, the comparison shows that the RFR and RFR with k-means perform better than XGBR and XGBR with k-means respectively. RFR with k-means outper-forms other models with an accuracy of 84.6%. With better performance, RFR with k-means also reduces the error rate of the model significantly.

new york city taxi trip duration

IRJET Journal

Taxi demand prediction is the process of using historical data to forecast future taxi requests in a particular area. Managers may pre-allocate taxi resources in cities with the aid of accurate and real-time demand forecasting, which helps drivers find clients more quickly and cuts down on passenger waiting times. This project is aimed to choose the best model in predicting the taxi demand where we use various Machine learning techniques such as regression analysis and time series forecasting. Various baseline models, including moving averages (simple, weighted, and exponential), linear regression with grid search, random forest regressor with random search, and XGBoost regressor with random search are used. We find out which model is more suitable in predicting the output using the metrics we obtain.

Dillip Rout

International Journal of Scientific Research in Computer Science, Engineering and Information Technology

International Journal of Scientific Research in Computer Science, Engineering and Information Technology IJSRCSEIT

Accurately predicting the travel time between two destinations is an essential aspect of traffic monitoring and facilitating ridesharing services. However, this is a highly complex and challenging task, which involves a multitude of variables that cannot be resolved straightforwardly. Previous studies on travel time prediction have focused on evaluating the duration of individual road segments or specific sub-paths before integrating the necessary time for each sub-path. While this method may provide some insight, it may result in an incorrect or imprecise time estimate. To address this issue, this research aims to utilize machine learning techniques to predict the duration of trips in ride-sharing networks, by utilizing the Uber movement dataset. The proposed system employs Python programming to calculate the distance between the pickup and drop-off locations. Furthermore, the study explores the various factors that affect travel time in a descriptive analysis. This includes examining the impact of traffic congestion, weather conditions, and road construction on travel time. The suggested approach incorporates a robust regression model known as Huber regression to enhance the accuracy of trip duration prediction and increase the precision of the algorithm. The Huber regression model is robust to outliers, making it suitable for the Uber movement dataset, which may contain unexpected and extreme values. The dataset is processed using k-fold cross-validation, which splits the dataset into k subsets, with each subset used for validation once while the remaining subsets used for training the model. However, this approach presents several challenges that need to be addressed, including the difficulties with tracking variables, the need for extensive data transformation due to the diverse data types contained in the dataset, and the challenge of handling unlabeled places during the segmentation of geographical data. Additionally, outliers in the dataset can lead to substantial data differences and affect the model's accuracy. Data normalization is slow due to the time-consuming nature of reading duplicated information. To mitigate these issues, additional study is required to improve the model's layout and address the challenges of working with the Uber movement dataset.

International Journal for Research in Applied Science & Engineering Technology (IJRASET)

IJRASET Publication

Taxi plays a crucial role in transportation especially in urban areas.Predicting the future demand for taxis in particular geographical location will greatly help internet based transportation companies like Ola, Uber etc. So that we can drastically decrease the waiting time of customers/passengers and also it helps taxi drivers to move to particular location where demand is high eventually making passengers,drivers and companies happy. In this Project we like to predict the demand for taxi in particular location for next 10 min using previous time series data .we want to perform this task of regression using machine learning models with high accuracy and then we would like to apply deep learning models and compare the results.we like to propose the best suited and high accuracy model for the problem.It will greatly help companies in managing the taxi fleet in cities.

— Land Transportation Sector is one of the key sectors in the Philippine economy particularly in Metro Manila. With the rapid urbanization of the Philippines, the urban transport infrastructure is expected to experience pressures posing a major risk of urban transport degradation resulting into longer travel times, economic and productivity losses. In light of this, the Land Transportation Franchising and Regulatory Board (LTFRB) along with DOST-ASTI has initiated a project on implementing a bus management system for Public Utility Vehicles utilizing real time GPS location data. This study takes on establishing a travel time prediction for the buses given a specific route. The travel time estimation was performed using Extremely Randomize Trees, a supervised machine learning algorithm. The resulting prediction set had a correlation of determination score indicative of a good predictive performance for travel time prediction.

arXiv (Cornell University)

Human-centric intelligent systems

Prof. Arnab K. Laha

This research aims to study the predictive analysis, which is a method of analysis in Machine Learning. Many companies like Ola, Uber etc uses Artificial Intelligence and machine learning technologies to find the solution of accurate fare prediction problem. We are proposing this paper after comparative analysis of algorithms like regression and classification, which are useful for prediction modeling to get the most accurate value. This research will be helpful to those, who are involved in fare forecasting. In previous era, the fare was only dependent on distance, but with the enhancement in technologies the cab’s fare is dependent on a lot of factors like time, location, number of passengers, traffic, number of hours, base fare etc. The study is based on Supervised learning whose one application is prediction, in machine learning.

Marco A. Casanova

This paper investigates the application of a Machine Learning technique to predict the time that will be spent by a vehicle between any two points in an approximated area. The prediction is based on a learning process based on historical data about the movements performed by the vehicles taking into account a set of semantic variables to get estimated time

RELATED PAPERS

sabri sabri

arXiv: General Mathematics

omar AJEBBAR

Journal of Pharmaceutical Research International

Aarthi Muthukumar

Regiane Ribeiro

Annals of the Rheumatic Diseases

Sven Remstedt

Journal of Orthopaedic Surgery and Research

Xavier Peirau

Journal of Eastern Mediterranean Archaeology and Heritage Studies

Ann E. Killebrew

konfo christian

JosĂŠ MuĂąoa Blas

Journal of Cleaner Production

Hamad Al-Turaif

Journal of Biological Chemistry

Karen Rosenspire

DÜMF Mühendislik Dergisi

Ms. Ujala Ehsan

Journal of Neuroinflammation

Alex Rovira

Proceedings of Singapore Healthcare

Joyce Joseph

制作mcmaste学位证书 麦克马斯特大学毕业证学位证书样板毕业证认证原版一模一样

Journal of Experimental Biology

Emanuel Andrada

The Journal of Urology

safwan jaradeh

International Journal of Engineering Applied Sciences and Technology

Bhavika Batra

RAD Conference Proceedings

RELATED TOPICS

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

new york city taxi trip duration

NYC taxi drivers could get charged full congestion pricing toll

S ome New York City taxi drivers could get hit by the full congestion pricing charge if one of the city’s two taxi technology companies doesn’t play ball with the MTA.

Curb, one of the taxi-tech firms that runs credit card readers for taximeters in the city, is attempting to charge the MTA a service fee in order to add a per-trip congestion toll on top of any taxi trips within the congestion pricing zone, the Daily News has learned.

The MTA’s congestion pricing plan , which will charge a base toll of $15 per day to cars entering Manhattan at 60th St. or below, includes a carve-out for taxis and other for-hire vehicles .

Rather than charge taxi drivers the $15 toll, the congestion pricing plan will assess a per-trip surcharge for trips through the congestion zone, passing the toll on to passengers.

The surcharge will be $1.25 for taxis and $2.50 for Uber and Lyft trips.

But that scheme requires the app and meter companies to sign an agreement with the MTA to render that surcharge to the transit agency, according to a letter sent by the Taxi and Limousine Commission Tuesday to medallion owners, livery base operators, and others in the industry.

“[Ubers and Lyfts] dispatched by a base that has not entered into the agreement, as well as taxis and [green cabs] utilizing a [meter company] that has not entered into the agreement, may be charged a $15 toll up to once per day paid by the vehicle owner, instead of the per-trip charge paid by the passenger, for entering the Congestion Relief Zone,” the TLC letter reads.

Meter companies “must agree to collect and remit the per-trip charge in order for the per-trip charge to apply to trips completed in vehicles that are equipped with that [company]’s technology system.”

Taxi and transit sources told The News Wednesday that Curb has been haggling with the MTA, trying to assess a service fee for sending the congestion toll to the transit agency.

A spokesman for Curb rejected the idea the company was “haggling.”

Curb spokesman Zak Hawke on Thursday said the firm was cooperating with the MTA and engaged in what he called “a typical review” of the agreement.

“[T]here is a significant cost associated with the technology, manpower and infrastructure needed to collect and reconcile the mandated Congestion Toll Zones fees on behalf of the MTA,” he said.

“It is reasonable for these costs to be acknowledged and for Curb to be compensated for these efforts, either now or in the future.”

Curb is committed to an on-time implementation of congestion pricing tolls, the spokesman said.

MTA chairman Janno Lieber characterized pending agreements with meter companies and livery bases to be a “paperwork” issue.

“We need folks who are going to be responsible for the interactions with the MTA and the collection of the money to have agreed with the process,” Lieber said.

Lieber did not name Curb or any other firm by name when asked about the situation Wednesday.

“The drivers themselves are the ones we’re trying to help,” he said. “We just don’t want them to get stuck if the company that is responsible for processing this — for their own reasons — does not execute an agreement.”

“Maybe some [drivers] will choose to move to another meter company,” he added.

Asked if the MTA would be open to a nominal fee, Lieber said no.

“I don’t think that’s called for,” he said. “There are a lot of taxes and fees that are already collected through these meters.”

Curb is one of two major players in the city’s taxi-meter industry. It’s preferred by owner-operator drivers over rival system Arro, owned by CMT Group, which is predominantly used by larger taxi fleets.

Sources said roughly 75% of taxis in the city have meters run by Curb.

A TLC spokesman Wednesday said that the city agency was not part of any arrangement between livery bases or meter companies and the MTA, but reiterated the TLC commissioners support of a per-trip charge.

“Hopefully MTA and these providers can come to an agreement that serves everyone,” TLC spokesman Jason Kersten said in a statement.

B’hairavi Desai, head of the New York Taxi Workers Alliance, said that while she doubted cabbies would be hit with the full $15, she worried that Curb or other providers might try to pass any fee the MTA rejected onto drivers instead.

Desai told The News that cabbies should be able to pay the MTA themselves.

“If Curb or Arro are going to present a problem, let the medallion owners pay directly,” she said.

Congestion pricing, which is currently facing legal challenges in New York and New Jersey, is expected to begin on June 30 .

Š2024 New York Daily News. Visit nydailynews.com. Distributed by Tribune Content Agency, LLC.

Watch CBS News

NYC's best and worst times to travel on Memorial Day Weekend

By Jesse Zanger

Updated on: May 24, 2024 / 12:55 PM EDT / CBS New York

NEW YORK - If you're going to hit the road this Memorial Day Weekend for the unofficial start of summer, you're not alone. 

Holiday travel is already underway in and around New York City ahead of the long weekend. 

AAA projects nearly 44 million people will be traveling more than 50 miles from May 23 through 27. That's a 4% hike from last year and, for the first time, will exceed pre-pandemic levels.

"We're projecting an additional one million travelers this holiday weekend compared to 2019, which not only means that we're moving beyond pandemic-era lulls, but also signals a very busy summer travel season ahead," Alec Slatky, of AAA Northeast, said. 

That means a record amount of road trips are expected. AAA estimates 38.4 million people will travel by car, the highest number for the holiday ever recorded since AAA began tracking in 2000. 

Some 3.5 million people will be flying this week, a 4.8% hike over last year. 

A record 6.4 million people are expected to use Port Authority airports, bridges and tunnels, the Port Authority said. Passengers on domestic flights should arrive at the airport at least two hours in advance, and for international travel, at least three hours is recommended. Here are some air travel tips from the Port Authority.

Here are the worst and best times to hit the road, according to transportation analytics company INRIX: 

Thursday, May 23 : Worst travel time 12 - 6 p.m. | Best travel time before 11 a.m., after 7 p.m. 

Friday, May 24 : 12 - 6 p.m. | Before 11 a.m., after 7 p.m. 

Saturday, May 25 : 2 - 5 p.m. | Before 1 p.m., after 6 p.m. 

Sunday, May 26 : 3 - 7 p.m. | Before 1 p.m. 

Monday May 27 : 3 - 7 p.m. | After 7 p.m. 

AAA says booking data shows New York City is the #3 domestic travel destination for the holiday weekend. 

Busiest NYC bridges and tunnels for Memorial Day Weekend

Here's a closer look at last year's data on how the various bridges and tunnels stack up, in terms of volume, over the weekend. 

*% change from 2019 and 2013 is calculated using only Brooklyn-bound Verrazzano traffic from 2023, to account for the implementation of two-way tolling in 2020.

The busiest time to travel, according the AAA is expected to be Thursday and Friday afternoons, when commuters and travelers will both be on the roads. 

A closer look at the Verrazano, Throgs Neck + Whitestone Bridges

Verrazano Bridge

  • Brooklyn-bound: The busiest time, other than the Thursday and Friday morning commutes, is expected to be Sunday in the late-afternoon and evening and Monday evening, according to AAA. In 2023, there were more than 6,500 cars per hour from 1 - 8 p.m. Sunday and 5 - 8 p.m. Monday
  • Staten Island-bound: Thursday evening and Friday late afternoon. Last year, there were more than 8,000 cars per hour from 4 - 7 p.m. Thursday and 3 - 5 p.m. Friday. 

Throgs Neck + Whitestone Bridges

  • Bronx-bound: The busiest times are Saturday morning and Sunday morning. In 2023, there were more than 8,000 cars per hour from 9 a.m. - 1p.m. Saturday and 11 a.m. - 1 p.m. Sunday. 
  • Queens-bound: The busiest times are Thursday evening and Sunday evening. Last year, there were more than 9,000 cars per hour from 4 - 7 p.m. Thursday and more than 8,700 cars per hour from 3 - 7 p.m. Sunday. 

PATH train service

PATH Trains will operate on a Saturday schedule on Memorial Day. 

  • Memorial Day
  • Port Authority

Jesse Zanger is managing editor of CBS New York. Jesse has previously worked for the Fox News Channel and Spectrum News NY1. He covers regional news around the Tri-State Area, with a particular focus on breaking news and extreme weather.

Featured Local Savings

More from cbs news.

Hundreds of flights delayed at NYC airports on Memorial Day

Memorial Day parades march through NYC and surrounding area

What's open and closed for Memorial Day around the Tri-State Area

Red Alert: Severe thunderstorms for Memorial Day travel around NYC

New York City congestion pricing, first in the nation, is approved at $15 and up for vehicles

A majority of the MTA board voted Wednesday in favor of New York City congestion pricing , green-lighting the controversial plan that will charge cars $15 to enter Manhattan below 61st Street and hit trucks with even higher tolls starting in just a few months.

Only one of the 12 board members opposed the proposal. The no vote was Nassau County board member David Mack.

The approval, essentially a rubber stamp of “clarifications” like exemptions, given the plan itself was approved last year, means congestion pricing can begin following a 60-day public information campaign and a concurrent 30-day testing period.

Read more from NBC New York

  • 3 plead to $4.5M NJ romance scam that left victims ‘broke and heartbroken'
  • Donald Trump plans to attend slain NYPD officer's wake: police spokesperson
  • NYC solar eclipse to perfectly coincide with Yankees' April 8 home game

Almost all 110 toll readers are already installed, positioning the MTA to begin collecting as soon as June 15. Federal judges on either side of the Hudson River could still block the plan, though the MTA expects that not to be the case.

The board overwhelmingly voted in favor of the plan in December, saying charging drivers to enter a swath of Manhattan would contribute millions of dollars to the aging, cash-strapped transit system. Wednesday’s vote is a critical final approval of “clarifications” and exemptions.

As NBC New York reported earlier this week, most of the cars likely to get full exemptions will be government vehicles.  Get details on the planned exemption list here.

The toll will not be in effect for taxis, but drivers will be charged a $1.25 surcharge per ride. The same policy applies to Uber, Lyft and other rideshare drivers, though their surcharge will be $2.50.

Despite what MTA officials say were overwhelming public comments “in favor” of congestion pricing by a 2-to-1 margin, a number of groups have stood in opposition.

Taxi advocates have blasted the plan, calling it “a reckless proposal that will devastate an entire workforce.”

Public hearings earlier in March paved the way for Wednesday’s vote. For its part, the MTA has insisted that it is merely implementing a state law aimed at cleaning the air and modernizing mass transit.

Pedestrians and cars move along First Avenue in pouring rain in Mahnhattan

How does congestion pricing work?

Congestion pricing will impact any driver entering what is being called the Central Business District (CBD), which stretches from 60th Street in Manhattan and below, all the way down to the southern tip of the Financial District. In other words, most drivers entering midtown Manhattan or below will have to pay the toll, according to the board’s report.

All drivers of cars, trucks, motorcycles and other vehicles would be charged the toll. Different vehicles will be charged different amounts — here’s a breakdown of the prices:

Passenger vehicles: $15

Small trucks (like box trucks, moving vans, etc.): $24

Large trucks: $36

Motorcycles: $7.50

The $15 toll is about a midway point between previously reported possibilities, which have ranged from $9 to $23.

The full, daytime rates will be in effect from 5 a.m. until 9 p.m. each weekday, and 9 a.m. until 9 p.m. on the weekends. The board called for toll rates in the off-hours (from 9 p.m.-5 a.m. on weekdays, and 9 p.m. until 9 a.m. on weekends) to be about 75% less — about $3.50 instead of $15 for a passenger vehicle.

Drivers will only be charged to enter the zone, not to leave it or stay in it. That means residents who enter the CBD and circle their block to look for parking won’t be charged.

Only one toll will be levied per day — so anyone who enters the area, then leaves and returns, will still only be charged the toll once for that day.

The review board said that implementing their congestion pricing plan is expected to reduce the number of vehicles entering the area by 17%. That would equate to 153,000 fewer cars in that large portion of Manhattan. They also predicted that the plan would generate $15 billion, a cash influx that could be used to modernize subways and buses.

Can I get a discount?

Many groups had been hoping to get exemptions, but very few will avoid having to pay the toll entirely. That small group is limited to specialized government vehicles (like snowplows) and emergency vehicles.

Low-income drivers who earn less than $50,000 a year can apply to pay half the price on the daytime toll, but only  after  the first 10 trips in a month.

While not an exemption, there will also be so-called “crossing credits” for drivers using any of the four tunnels to get into Manhattan. That means those who already pay at the Lincoln or Holland Tunnel, for example, will not pay the full congestion fee. The credit amounts to $5 per ride for passenger vehicles, $2.50 for motorcycles, $12 for small trucks and $20 for large trucks.

Drivers from Long Island and Queens using the Queens-Midtown Tunnel will get the same break, as will those using the Brooklyn-Battery Tunnel. Those who come over the George Washington Bridge and go south of 60th Street would see no such discount, however.

Public-sector employees (teachers, police, firefighters, transit workers, etc.), those who live in the so-called CBD, utility companies, those with medical appointments in the area and those who drive electric vehicles had all been hoping to get be granted an exemption. They didn’t get one.

UFT President Michael Mulgrew, one of the lead plaintiffs in a federal lawsuit again congestion pricing, said following the MTA approval that now it’s the courts’ job to step in.

“Now that the MTA board has voted, it is going to be up to the courts to prevent the huge environmental injustice that threatens families outside the Manhattan congestion zone, including communities that are already suffering some of the worst air pollution and asthma rates in the country,” Mulgrew said.

Andrew Siff is a reporter for NBC New York. 

An illustration of a New York City street from above. There is a traffic light and a street sign hanging over the center of the street. On the sidewalks there are a newspaper, a crumpled up piece of paper, a pile of trash bags and a fire hydrant.

Street Wars

The Battle for the Streets of New York

Now more than ever, the city is being forced to rethink how its thoroughfares are used.

By Dodai Stewart Illustrations by Leon Edler

New York City streets and sidewalks have always been crowded, but it’s never been like this.

There are more people, more cars and more bicycles. And that’s not all.

Dining sheds are squeezed beside bike lanes. Home delivery has exploded, ushering in more e-bikes, cargo bikes and trucks.

new york city taxi trip duration

It’s all crammed into streets laid out over 200 years ago. The result? A chaotic struggle for space unlike any the city has ever seen.

Sign up for Street Wars. A weekly series about the battle for space on New York’s streets and sidewalks. You’ll also receive local reporting on the stories that define the city, via our daily newsletter, New York Today. Get it sent to your inbox.

On a recent morning, the intersection of East 77th Street and Lexington Avenue presented a vivid illustration of the tumult.

A taxi trying to make a left turn had to maneuver around a Verizon crew digging up the asphalt. A box truck was parked in the bus lane, and the M102 bus, with its accordionlike belly, was forced to change lanes and snake around it.

Dozens of people streamed out of the subway and into the crosswalk. A man pushing a double stroller navigated between the subway entrance and a sidewalk compost box. A woman’s shopping cart wheels got stuck in a crack in the sidewalk. CitiBikes and delivery bikes whizzed by. A cargo bike stopped in front of a FedEx truck that was unloading packages next to a bike lane.

Lively, energetic streets make city living attractive — people to watch, windows to browse, benches to sit on, trees for shade.

But lately, New York City streets are teetering between lively and unlivable. Residents clash over traffic, noise, parking, 5G towers and heaps of trash. Most years, far fewer pedestrians get killed by motorists than in generations past, but last year was the deadliest year for cyclists since 1999 .

Still, people who have thought deeply about the state of the city’s streets believe dramatic improvement may be on the way — if New York is willing to seize the moment.

That’s because the city is about to embark on the nation’s first congestion pricing plan, charging most drivers $15 to enter much of Manhattan below 60th Street — and forcing many commuters to find a different way into the city.

The aim is to reduce car traffic in one of the world’s busiest commercial districts and raise money for public transportation.

A city bus, a yellow taxi cab and a passenger car compete for space on East 86th Street.

People, bikes and vehicles compete for space on New York City’s streets.

Karsten Moran for The New York Times

“I think this could be the catalyst for a streets renaissance in New York,” Janette Sadik-Khan, New York City’s former transportation commissioner, said in a recent interview.

“We have to talk about how we’re going to reclaim that space and make it work for people.”

Of course, congestion pricing, too, comes with a fight. The plan is supposed to start in June, but it faces several lawsuits brought by elected officials and residents from across the region, who describe it as ill-conceived and unfair to commuters who drive because public transit isn’t robust enough to serve their needs.

“They don’t drive because they want to,” said Susan Lee, a member of a coalition called New Yorkers Against Congestion Pricing. “They don’t want to sit in traffic.”

Could congestion pricing actually reduce the number of cars in the city to a dramatic extent? If so, what would take their place?

There are other ideas and experiments in the works for taming New York’s streets, and they raise questions of their own. Could a proposal to ban parking close to intersections improve public safety? Will the Sanitation Department’s garbage containerization plan make sidewalks cleaner? Is there a way to keep package delivery trucks from blocking the streets? Must 5G technology create public eyesores in residential neighborhoods?

In the months ahead, The New York Times will examine the debates raging in neighborhoods all over the city about who and what gets to take up space on New York’s streets and sidewalks.

An illustration of a streetlamp lit up.

How did we get here?

Orchestrating the flow of traffic and pedestrians has been a complicated and emotional project for centuries.

New York City’s streets were laid out before anyone knew how they would ultimately be used — long before cars were even invented. The first city planners could not have anticipated Uber vehicles, let alone Amazon deliveries or commuters on electric scooters.

In New York’s earliest days, the streets were a free-for-all. People walked or rode horses. There were no crosswalks or stoplights; if you had to cross the street, you simply walked across the street.

Horse-drawn carts, streetcars and pedestrians compete for space on Broadway in 1859.

Traffic on Broadway in 1859 consisted of pedestrians, horse-drawn carts and streetcars.

William Notman, via Getty Images

Soon, horse-drawn vehicles used the streets alongside pedestrians , and people dashed between them. (Later, New Yorkers dodged streetcars in much the same way, giving the Brooklyn baseball team its name.)

The arrival of bicycles neatly encapsulated the city’s ever-shifting debate over how the streets should be used — and by whom.

By the 1890s, the streets were full of bikes. Men and women took to cycling through the city so quickly — and dangerously — that it was called “scorching.”

About 100 years later, in 1987, speeding bike messengers were deemed so dangerous that bicycles were banned from Midtown — temporarily .

Today, the city encourages residents and visitors to ride bikes. New York has bike lanes and a flourishing bike share program, plus an explosion of food delivery powered by e-bikes. The renewed popularity has also come at a grave cost: Last year 30 cyclists were killed on city streets, and 395 were severely injured.

“It’s hard to say whether it’s the best of times or the worst of times for bicycling,” said Jon Orcutt, the director of advocacy at Bike New York and the former policy director at the city’s Department of Transportation. “More people are doing it than ever.”

“If you’re not killed — squished like a bug — you can bike across town in 10 minutes,” he added. “It’s easy. It’s really efficient.”

An illustration of two cars from above, one is honking.

Enter the car — and the car crash

On the evening of Sept. 13, 1899, Henry Hale Bliss, a 69-year-old real estate broker, was riding a Manhattan streetcar on his commute home.

At 74th Street and Central Park West, Mr. Bliss stepped from the streetcar and into the street, where he was immediately hit by a taxi. He died on the scene and is recognized as the first person in the United States to be killed by a car . There is a plaque at the intersection commemorating his death.

“At the end of the Gilded Age, right before World War I, suddenly, there were motor vehicles everywhere,” said James Nevius, an author and New York historian.

The development meant people could move around faster — but it also put more people in danger.

In 1920, there were about 200,000 registered vehicles in New York City; by 1925 that number had more than doubled. A century later, that figure is two million.

By the 1930s, cars dominated New York City streets, including here at the intersection of Park Avenue and 57th Street.

This scene of Park Avenue near 57th Street was typical of 1930s traffic. Over 10 million cars went through the Holland tunnel in 1930.

George Rinhart/Corbis, via Getty Images

And yet New Yorkers are still using the same streets that were laid out generations ago. In Manhattan, the rigid street grid was designed in 1811. Avenues are 100 feet across. Cross streets are 60 feet wide, including the space for sidewalks on both sides.

That’s 720 inches in which to fit not just cars but also pedestrians, baby strollers, trash, compost, scaffolding, bicycles, e-bikes, scooters, skateboards, package delivery trolleys, garbage trucks, delivery trucks, food carts, 5G towers, dining sheds, trees, CitiBike docks, buses, taxis, ambulances and on-street parking.

It’s like a giant game of Tetris — except all the pieces just won’t fit.

In fact, some of the pieces are growing larger: In the past decade, the average vehicle got 12 percent longer and 17 percent wider . (Cars’ blind spots have also gotten larger .)

And the number of pieces just keeps expanding. New York City’s population reached 8.8 million in 2020, and the New York region is now home to nearly 19 million people. The city’s population has dropped some in the past few years, but city officials believe that recent population estimates have significantly underestimated the number of newly arrived migrants, which, by some counts, is over 180,000 .

An illustration of two cyclists from above.

Taming the streets

Even as New York’s streets and sidewalks have become more chaotic, there are also plenty of examples of the opposite: moments when the city has tamed the traffic and found new uses for its old spaces.

Over the past 10 to 15 years, sweeping pedestrian plaza initiatives — detouring cars and encouraging space for sitting and strolling — have gradually changed the landscape, from the Jackson Heights neighborhood in Queens to Times Square .

Visitors to Times Square relax on lounge chairs.

Times Square was once full of traffic. In May 2009, the city closed Broadway to cars and set out lawn chairs, the start of the area’s transformation to pedestrian plaza.

Damon Winter/The New York Times

The Open Streets program restored pedestrian-first streets, free of cars and safe enough for strolling, chatting and letting kids ride bikes.

The coronavirus pandemic ushered in a chance to rethink public spaces, and the absolute quiet on the streets during lockdown was a reminder that the city isn’t inherently noisy, but traffic is.

And there are plenty of other places to look for inspiration: In Bogotá, Stockholm , London and Paris, certain streets are being closed to cars . There is an effort in Europe to avoid the oversize pickup trucks and SUVs that make American roads so deadly. Paris has designated “school streets” where cars have been removed to make way for children . Cycling is flourishing in Europe; emissions are down .

In New York, Ms. Sadik-Khan, the former transportation commissioner, is among the people thinking deeply about the future of streets — and she is optimistic.

“There’s a new generation of New Yorkers who’ve never known a city without protected bike lanes and bike share,” Ms. Sadik-Khan said. “More people than ever are working from home. Commuting patterns are in flux. There’s the opportunity to make a new deal for people getting around.”

What will a “new deal” look like? And will New Yorkers be on board?

No matter what happens, change doesn’t come without a fight — and many of the battles will be fought street by street and block by block.

Over the next few months, we will take a close look at some of these street fights — and we’re eager to hear about yours, too.

Use this form to tell us what you think about the state of New York City’s streets.

new york city taxi trip duration

Food Delivery Workers, Overlooked in Life, Are Honored in Death

new york city taxi trip duration

The Great Gotham Vroom Boom of 2020

new york city taxi trip duration

Can Congestion Pricing Alter New York’s Car Culture?

new york city taxi trip duration

Congestion Pricing’s Impact on New York? These 3 Cities Offer a Glimpse.

new york city taxi trip duration

New York as a Biking City? It Could Happen. And It Should.

  • Share full article

Advertisement

IMAGES

  1. NYC Taxi Trip Duration

    new york city taxi trip duration

  2. Prendere un Taxi a New York City

    new york city taxi trip duration

  3. How to Get a Taxi in NYC 2024

    new york city taxi trip duration

  4. Такси в нью йорке

    new york city taxi trip duration

  5. New York City (NYC) Taxi Cabs Guide

    new york city taxi trip duration

  6. NYC Taxi Trip Duration Prediction using Machine Learning

    new york city taxi trip duration

VIDEO

  1. New York City taxi ride on 5th Ave

  2. NYC Taxi Trip Duration

  3. NEW YORK CITY TAXI DRIVER 🚖👮🚸🚴🔥⭐: Taxi sim 2022 Evolution -Gameplay

  4. Local car enthusiast driving around downtown Vancouver in New York City cab

  5. Taxi Trip Duration Prediction Part-1

  6. NYC Taxi & NYPD Auxiliary

COMMENTS

  1. TLC Trip Record Data

    The trip data was not created by the TLC, and TLC makes no representations as to the accuracy of these data. For-Hire Vehicle ("FHV") trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID (shape file below). These records are generated from the FHV Trip Record ...

  2. anushadatta/NYC-Taxi-Trip-Duration

    The Kaggle competition named "New York City Taxi Trip Duration" consists of the 2016 NYC Yellow Cab trip record data, which was originally published by the NYC Taxi and Limousine Commission (TLC). This competition demands us to build a model that predicts the total ride duration of taxi trips in New York City. Thus, the problem statement is ...

  3. New York City Taxi Trip Duration

    Share code and data to improve ride time predictions. Share code and data to improve ride time predictions. code. New Notebook. table_chart. New Dataset. tenancy. New Model. emoji_events. New Competition. corporate_fare. New Organization. No Active Events. Create notebooks and keep track of their status here. add New Notebook. auto_awesome ...

  4. NYC Taxi Trip Duration Prediction using Machine Learning

    The mean difference between predicted and actual duration is -739.25, i.e., a model based on yellow taxis predicts almost a ~12-minute lesser travel duration. One reason for the lower travel time ...

  5. New York City taxi trip duration prediction using MLP and XGBoost

    New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to travel from one place to another ...

  6. This is a comprehensive Exploratory Data Analysis for the New York City

    This is a comprehensive Exploratory Data Analysis for the New York City Taxi Trip Duration competition with Python and Data Visualization libraries such as matplotlib and seaborn. I also use New York City Taxi with OSRM to support the primary dataset.. The goal of this playground challenge is to predict the duration of taxi rides in NYC based on features like trip coordinates or pickup date ...

  7. New York City Taxi Trip Duration

    In this competition, Kaggle is challenging you to build a model that predicts the total ride duration of taxi trips in New York City. Your primary dataset is one released by the NYC Taxi and Limousine Commission, which includes pickup time, geo-coordinates, number of passengers, and several other variables. Longtime Kagglers will recognize that ...

  8. A Guide to Taxi Cab Etiquette When Visiting New York City

    Payment Etiquette: Tips and Tolls. In New York City, tipping your taxi driver is customary and appreciated as a gesture of thanks for good service. A tip of 20% of the fare is standard, though you may choose to tip more for exceptional service or convenience. For rides that require tolls, the tolls are typically added to the final fare, and ...

  9. NYC Taxi Trip Time Prediction

    Developed various models to predict the total ride duration of taxi trips in New York City 💾 Data Description The dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform.

  10. Cab Etiquette In NYC: All You Need to Know

    Yes, but only if the trip is more than 12 hours long, or if their 'taxi' light is off. 12 hour+ journeys are against the law in the US, and only taxis with their lights on are currently working. If you're staying far out of the city centre, perhaps get in the cab before telling them where you're going.

  11. Exploring NYC Taxi Data (Updated)

    Scatterplot of all pickups and dropoffs in New York City Summary. This post explores a subset of the NYC taxi dataset for the month of April 2013. I extract, transform and load the trip fare and trip details csv files into a sqlite database. I use this data to predict the fare and tip taxi drivers will receive.

  12. The Ultimate New York City Taxi Guide

    What a taxi costs. I remember not too long ago (or so it seems) when taking taxis in New York a decent distance would cost $5 to $7. Those days are long gone. The base fare is $2.50 with 50 cents added every 1/5 of a mile or 60 seconds of slow traffic or stop time.

  13. New York City Taxi Trip Duration Prediction Using Machine Learning

    New York City Taxi Trip Duration Prediction Using Machine Learning. May 2023. DOI: 10.22214/ijraset.2023.52768. Authors: Nandeshvar R K. Dr. Janaki K. Avin Joseph. K Sakthivel. Show all 5 authors.

  14. New York City taxi trip duration prediction using MLP and XG

    Downloadable (with restrictions)! New York City taxi rides form the core of the traffic in the city of New York. The many rides taken every day by New Yorkers in the busy city can give us a great idea of traffic times, road blockages, and so on. Predicting the duration of a taxi trip is very important since a user would always like to know precisely how much time it would require of him to ...

  15. Exploratory Data Analysis on NYC Taxi Trip Duration Dataset

    There is no visible relation between trip duration and passenger count. Trip Duration per hour sns.lineplot(x='pickup_hour',y='trip_duration',data=data) We see the trip duration is the maximum around 3 pm which may be because of traffic on the roads. Trip duration is the lowest around 6 am as streets may not be busy.

  16. New York City taxi trip duration prediction using MLP and XGBoost

    Since the duration of the taxi trip is highly dependent on the time at which the trip is made, the prediction becomes highly complex. In this regard, we have taken into account the time of the trip for reliable predictions. Also, we have excluded co-ordinates of locations present outside New York City because of their outlying nature.

  17. Exploratory Data Analysis of New York Taxi Trip Duration ...

    trip_duration: (target) duration of the trip in seconds Thus we have a data set with 729322 rows and 11 columns. There are 10 features and 1 target variable which is trip_duration

  18. New York City taxi trip duration prediction using MLP and XGBoost

    Poongodi et al. (Poongodi et al., 2021) presented a trained XGBoost model that was able to predict the taxi trip durations having an RMSE value of 0.39, and concluded that XGBoost performed better ...

  19. Linear Regression Model on the New York Taxi Trip Duration ...

    The average speed of a taxi in New York City is about 11 km/hour. The data has several data points with a speed way beyond that. We will now have a look at the distribution of the distance ...

  20. Predicting New York Taxi Trip Duration Based on Regression ...

    If route A is X kilometres longer, but gets you there, Y minutes faster than route B would, one would take route B over A. New York City Taxi and Limousine Commission (TLC) deals with the licencing of taxicabs operated by the private companies in New York along with overseeing about 40,000 other for-hire vehicles.

  21. NYC Taxi Trip Duration Prediction using Machine Learning

    The mean difference between predicted and actual duration is -739.25 i.e. a model based on yellow taxis predicts almost a ~12 minute lesser travel duration. One reason for the lower travel time in ...

  22. TRIP DURATION PREDICTION: NEW YORK TAXI RIDES USING XGBoost (NYC Taxi

    Dataset- New York City Taxi Duration Dataset Dataset is based on the 2016 NYC Yellow Cab trip record data made available in Big Query on Google Cloud Platform. The training set (contains 1458644 trip records) and the testing set (contains 625134 trip records).

  23. NYC taxi drivers could get charged full congestion pricing toll

    Some New York City taxi drivers could get hit by the full congestion pricing charge if one of the city's two taxi technology companies doesn't play ball with the MTA. Curb, one of the taxi ...

  24. NYC's best and worst times to travel on Memorial Day Weekend

    Thursday, May 23: Worst travel time 12 - 6 p.m. | Best travel time before 11 a.m., ... AAA says booking data shows New York City is the #3 domestic travel destination for the holiday weekend.

  25. New York City congestion pricing, first in the nation, is approved at

    Small trucks (like box trucks, moving vans, etc.): $24. Large trucks: $36. Motorcycles: $7.50. The $15 toll is about a midway point between previously reported possibilities, which have ranged ...

  26. NYC Taxi Trip Duration

    Explore and run machine learning code with Kaggle Notebooks | Using data from New York City Taxi Trip Duration

  27. Enter the car

    In 1920, there were about 200,000 registered vehicles in New York City; by 1925 that number had more than doubled. A century later, that figure is two million. This scene of Park Avenue near 57th ...